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(  MAITICR  I  -  INTRODUCTION 


The  conventional  classification  of  inultispectral  image  data  collected  by 
remote  sensing  devices  such  as  the  multispectral  scanners  aboard  the  Landsat 
or  Skylab  sateilites  has  usually  been  performed  such  that  each  pixel  is  classified 
by  using  spectral  information  independent  of  the  neighboring  pixels.  There  is 
no  provision  for  using  the  spatial  information  inherent  in  the  data.  In  many 
cases  there  are  available  other  sources  of  data  which  an  analyst  can  use  as  well 
as  spatial  information  to  establish  a  context  for  deciding  what  a  particular 
pixel  in  the  imagery  might  be.  By  utilizing  this  contextual  information,  it  may 
be  possible  to  achieve  an  improvement  in  classification  accuracy.  For  example, 
in  the  case  of  forestry,  various  tree  species  are  known  to  exhibit  growth 
patterns  dependent  upon  topographic  position.  When  this  fact  is  used  along 
with  spectral  and  spatial  information,  a  classification  with  enhanced  accuracy 
can  be  obtained  [1,2]. 

The  supervised  relaxation  operator  which  combines  information  from 
spectral,  spatial,  and  ancillary  data  to  cla.s.sify  multispectral  image  data  is  part 
of  the  subject  of  this  report.  Relaxation  operations  are  a  class  of  iterative, 
parallel  techniques  for  using  contextual  information  to  reduce  local  ambiguities 
[3].  Such  techniques  have  been  found  to  be  useful  in  many  applications  such  as 
edge  reinforcement  [4],  image  noise  reduction  [5],  histogram  modification, 
thinning,  angle  detection,  template  matching,  and  region  labelling  [6].  Its 


convorgoncp,  the  derivation  of  ootnpaf ibility  coedicients,  and  the  theoretical 
foundations  of  relaxation  labeling  processes  have  been  described  in  (7,8,9,10,11). 
A  modification  to  existing  probabilistic  relaxation  processes  to  allow  the 
information  contained  in  the  initial  labels  to  exert  an  influence  on  the  direction 
of  relaxation  throughout  the  process  has  been  described  in  [12].  This  modified 
relaxation  method,  called  supervised  relaxation  labeling,  has  been  extended  to 
incorporate  one  ancillary  information  source  into  the  results  of  an  existing 
classification  [2],  In  this  thesis,  a  method  for  integrating  image  data  from 
multiple  ancillary  data  sources  is  described  in  Chapter  2.  Based  on  the 
approach  in  [13],  the  convergence  property  of  the  supervised  relaxation 
operator  is  presented.  The  supervi.sed  relaxation  operator  is  generalized  and 
demonstrated  experimentally  to  combine  information  from  spatial  and  multiple 
ancillary  data  sources  with  the  spectral  information  for  the  classification  of 
mult ispectral  imagery  with  multiple  classes. 

Due  to  the  high  computational  complexity  of  such  operations  and  the 
availability  of  low  cost  microprocessors,  architectures  involving  multiple 
proces.sors  are  very  attractive.  Several  parallel  organizations  have  been 
proposed,  principally  SIMI)  and  MIMD  architectures.  In  Chapter  3  of  this 
report  we  focus  our  attention  on  performance  me-asures  for  parallel  proces.sors 
and  interconnection  networks  for  multiprocessor  systems.  Section  3.1  will 
present  the  application  of  S-Nets  [I  l.l-'ij  to  modeling  the  maximum  likelihood 
algorithm  in  SIMI)  and  pipeline  implementations.  Several  alternative 
performance  measures  for  the  parallelism  inherent  in  the  algorithm  are  also 
described.  I'lie  algorithm  execution  tinu's  dcrivi'd  based  on  the  analysis  of 
implicit  and  explicit  operations  in  tin*  algorithm  for  SlMI)  and  MIMI)  modes  of 
parallelism  are  compared  and  <Iiscnss<'d  in  .Section  3  2.  In  Scition  3.3,  th(> 
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determination  of  the  optimal  number  of  processing  elements  in  terms  of 
execution  time  and  system  cost  for  a  given  image  size  based  on  the  measures  of 
evaluating  the  performance  of  an  algorithm  is  discussed.  Finally,  Section  3.4 
describes  how  two  multistage  networks  [16,17,18,10],  the  cube  network  and 
ADM  network,  can  be  configured  to  support  an  implementation  which  utilizes 
both  SIMD  and  MIMI)  processing  components. 


CHAPTER  2  -  SUPERVISED  RELAXATION  OPIORATOR 


In  this  chapter.  Section  2.1  conl.iins  a  slep-hy-st ej)  (lescri|)tion  of  the 
supervised  relaxation  operator.  In  Section  2.2,  a  method  for  the  integration  of 
image  data  from  multiple  auxiliary  data  sources  will  he  given.  Section  2.3 
discusses  the  convergence  property  of  the  supervised  relaxation  algorithm. 
Finally,  Section  2.4  presents  experimental  results  on  a  I82-by-182  pixel 
Landsat  image. 

2.1  Derivation  of  the  Supervised  Relaxation  Algorithm 

In  this  section,  the  derivation  of  the  supervised  relaxation  operator  from 
the  original  relaxation  operator  [3)  is  described  step  by  step,  and  an  heuristic 
interpretation  of  the  supervised  relaxation  operator  is  given. 

2.1.1  Probabilistic  Relaxation  Operator 

Most  classifiers  used  with  remote  .sensing  data  are  pixel-specific.  Each 
pixel  is  classified  by  using  spectral  information  independent  of  the  classification 
of  the  neighboring  pixels.  No  spatial  or  contextual  information  is  used. 

Relaxation  labeling  processes  are  a  class  of  iterative,  parallel  techniipies 
for  using  contextual  information  to  disambigiiatt'  probabilistic  labelings  of 
objects  [3|.  They  make  use  of  two  (lilferent  sources  of  information,  a  priori 
neighborhood  models  and  initial  observations,  which  interact  to  produce  the 


final  labelings.  The  relaxation  |)roe<*s.ses  involve  a  si-l  of  objects,  A  =  { 
ap  Bin,  a^},  and  the  relationship  of  each  object  to  its  neighbors.  Attached  to 
each  of  the  objects  is  a  set  of  labels,  A  =  {X|  Xo,  •  ,  X^},  where  each  label 
indicates  a  possible  interpretative  as.sertion  about  that  object;  for  example,  the 
objects  might  l>e  image  pixels  and  the  labels  might  be  spectral  cla.ss  names. 
I’he  relaxation  algorithm  attempts  to  iisi*  constraint  or  compatil)ility 
relationships  defined  over  [lairs  of  labels,  po.ssibIe  interpretations,  attached  to 
current  and  neighlioring  objects  in  order  to  eliminate  inconsistent  combinations 
of  labels.  (“Current”  refers  to  an  object  which,  at  the  moment,  is  the  focus  of 
attention.) 

A  measure  of  likelihood  or  confidence  is  associated  with  each  of  the 
po.ssibIe  labels  attached  to  objects.  This  measure  is  denoted  by  Pi(X).  These 
likelihoods  satisfy  the  condition 

VPi(X)  =  I,  for  all  a,<  A,  0  <  Pi(X)  <  1. 

X(A 

In  this  probabilistic  mode),  the  likelihoods  estimated  for  an  object’s  labels 
are  updated  on  the  basis  of  the  likelihoods  distributed  among  the  labels  of  the 
neighboring  objects.  These  likelihoods  interact  through  a  set  of  compatibility 
coeflicients  that  are  defined  for  each  pair  of  labels  on  current  and  neighboring 
objects.  The  compatibility  coeflicients  are  determined  a  priori  based  on  some 
foreknowledge  of  or  on  a  typical"  model  for  the  image  to  be  classified.  More 
specifically,  for  a  given  object  aj,  the  likelihood  Pj(X)  of  a  given  label  X  should 
increase  if  the  neighboring  objects'  labels  with  high  likelihoods  are  highly 
comiiatible  with  X  at  a^.  Conversely.  P,(X)  should  de(  ri'ase  if  neighboring  high- 


likelihood  labels  are  incompatible  with  X  at  a,.  On  the  other  hand,  neighboring 
labels  with  low  likelihoods  should  have  little  influence  on  Pi(X)  regardless  of 
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their  compatibility  with  it.  Let  X  be  a  label  for  a  neighboring  object  and  X  for 
the  current  object.  These  characteristics  can  be  summarized  in  tabular  form  as 
follows: 

Compatibility  of  X'  with  X 


High 

Low 

High 

+ 

- 

Low 

0 

0 

where  +  means  that  Pi(X)  should  increase,  -  means  that  it  should  decrease, 
and  0  means  that  it  should  remain  relatively  unchanged.  The  coeflicient 
'Yij(X,x')  is  a  measure  of  the  compatibility  between  label  X  on  current  object  a; 
and  label  X’  on  neighboring  object  aj.  These  compatibility  coeflicients  are 
defined  over  the  range  [-1,  1].  The  aim  is  that  the  should  behave  as 
follows: 

7ij(X,X' )  >  0,  if  X  on  aj  frequently  co-occurs  with  X  on  aj. 

=  1,  if  X  on  a;  always  co-occurs  with  on  aj. 

<  0,  if  X  on  aj  rarely  co-occurs  with  X  on  aj. 

=  -I.  if  X  on  a;  never  co-occurs  with  X  on  a^. 

=  0,  if  X  on  .'ij  occurs  indcpciHieiil  ly  of  X  on  a^. 

With  these  compatibilities,  the  cfuitextual  knowledge  is  incorporated  into  the 
probabilistic  relaxation  o|>erator. 

The  initial  lalu’l  likelihoods  are  provided  from  a  classification  of 
mult  ispect ral  Landsat  d.ita.  mull  is|)('ctral  scanner  (MSS)  g.itliers  radiaiici' 
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data  in  various  sections  of  the  electromagnetic  spectrum  (wavelength  bands). 
For  example,  the  Landsat  MSS  has  four  wavelength  bands  in  the  visible  and 
near  infrared  spectrum.  Such  remotely  sensed  data  have  been  collected  and 
stored  in  digital  format. 

The  first  step  in  multispectral  cla.s.v:ifi(  af  ion  begins  with  the  selection  of  an 
MSS  data  set  which  h:is  sullicient  (piality  so  that  the  classes  of  interest  on  the 
land  covered  can  be  identified  with  the  desired  accuracy.  After  choosing  the 
best  available  data  set  for  analysis,  the  study  area  within  the  data  is  located 
and  the  reference  data  (such  as  aerial  photography,  maps,  etc.)  are  correlated 
with  the  multispectral  scanner  data.  These  reference  data  provide  the  key  to 
relating  succc.ssfully  the  .spectral  respon.ses  in  the  data  to  the  cover  types  on  the 
ground. 

The  .second  step  is  the  .selection  of  the  training  samples.  A  common 
procedure  for  selecting  the  training  areas  is  to  use  the  available  reference  data 
to  identify  areas  that  contain  the  information  classes  of  interest.  The  images  of 
these  areas  are  then  identified  in  the  multispectral  scanner  data.  These 
training  samples  are  used  (o  determine  ()arame(ers  for  the  pattern  recognition 
algorillims,  elfeclively  "training"  the  computer  to  recognize  the  classi's  of 
interest.  Later  when  the  classilicat  icui  opt'ration  is  carried  out  by  the  [)attern 
recognition  algorithm,  each  data  point  to  be  classified  is  compared  with  the 
training  samples  for  each  class,  and  the  pixel  is  assigned  to  the  class  it 
ro.semble.s  most  closely. 

'I'he  third  step  is  to  use  these  training  samples  to  (hdine  training  cl.asses. 
The  training  classes  ar<'  often  di  iracterized  iti  terms  of  the  mean  vectors  and 
covariance  matrices  by  the  clustering  algorithm.  Clustering  may  be  used  to 
identify  natural  spectral  groupings  of  pixels  in  the  training  samples.  These 
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natural  groupings,  called  “spectral  classes,”  are  used  as  candidate  training 
classes.  To  be  sure  of  the  reliability  in  identifying  the  information  class  of  each 
cluster  class  obtained,  all  available  reference  data  are  used.  This  step  is  the 
most  important  in  ensuring  that  the  classifier  is  correctly  trained. 

The  final  step  is  classification  using  a  maximum  likelihood  classification 
algorithm.  The  set  of  discriminant  functions  for  the  maximum  likelihood 
classification  rule,  usually  specified  in  terms  of  mean  vectors  and  covariance 
matrices  of  classes,  is  derived  from  the  statistical  decision  theory  so  as  to 
minimize  the  probability  of  making  an  erroneous  classification.  When  the  data 
values  of  a  pixel  are  substituted  into  all  of  the  functions,  the  pixel  is  assigned 
to  the  class  which  produces  the  largest  value. 


The  initial  likelihoods  of  the  labelings  for  each  pixel  are  provided  by  the 
values  of  the  discriminant  functions  of  the  classification  algorithm.  Kach 
probability  is  then  updated  by  a  rule  of  the  form: 


PjO^llX)  [I  +  (|i<M(X)] 
Vl>p)(X)  [1  +  (1|<'''(X)] 

X<A 


(2.1.1) 


where 


qi"'’(>^)  =  X:(iiiL^Oij(x,x')if»(x')  (2.1.2) 

j<J  X 

where  k  is  the  iteration  of  the  relaxation  process  and  <|/*'*(X)  denotes  the  kth 
estimate  of  (he  neighborhood  conlribiition.  .1  defines  (li(>  neighborhood  about 
the  current  |)ixel  being  considered.  I'he  coellieients  d-  rt‘presen(  (he  possible 

weighting  consiants  (which  satisfy  ^djj  =  I)  over  the  neighboring  obji'cts  Uj. 

j.J 

These  coefficients  insure  (hat  qj  is  in  the  range  [-1,1]  and  allow  different 


neighbors  in  J  to  have  different  degrees  of  influence  in  the  neighborhood 
contribution.  Indeed,  if  r/‘‘’(X')  is  high,  and  nij(X,X’)  is  very  positive  or  very 
negative,  then  the  label  X'  at  aj  makes  a  substantial  positive  or  negative 
contribution  to  qp'(X);  while  if  r/'''(X')  is  low,  x'  at  aj  makes  relatively  little 
contribution  to  q/''*(X)  regardless  of  the  value  of  7jj(X.X  ).  Therefore,  a  very 
positive  or  very  negative  contribution  to  qpl  contributes  an  increase  or 
decrease  to  since  r,*'''(X)  is  obtained  by  multiplying  P;''"'  by  (1  +  qj^'^l), 

whereas  a  small  contribulion  to  qp*  contributes  littb'  change  to  P,**^ Mere, 
the  denominator  in  I'qualion  (*2.1.1)  guarantees  llial  all  the  Ps  sum  to  1. 
Moreover,  they  remain  nonnegative,  since  is  in  the  range  [-1,1)  provided 

that  <lij  =  I,  so  I  +  (\l^^  is  nonnegative. 

J 

This  rule  is  used  to  update  the  likelihood  of  (>a(h  label  on  each  object  in 
paralbd,  and  is  then  iter;ite(l  until  no  further  changes  occur.  .\t  this  point,  we 
say  that  the  (inal  lalx'lings  rtsich  a  balance  in  consistency  betw(>en  spectral  and 
spatial  (or  contextual)  data  sources  of  information. 

2.1.2  Compatibility  Coellicients  as  Conditional  Probabilities 

The  original  iterative  r<ile  of  lupiation  (2.1.1)  uses  a  priori  knowledge 
embedded  in  the  d|j  and  7|j  functions  to  disambiguati'  tlii>  initial  likelihoods.  In 
the  original  design  [3],  the  7ij(X.X  )  coellicients  were  regarded  as  representing 
correlation  functions.  This  association  turns  out  to  be  in  general  inadequate. 
The  representation  of  conditional  likelihoods  seems  to  be  appropriate.  Such 
conditional  measures  are  in  the  form  of  “given  X'  on  aj,  how  compatible  is  this 
with  X  on  a,?".  Now,  we  shall  transform  the  ui)daling  ruh>  of  Equation  (2. 1. 1) 


•j  W-V 
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compatibilities,  [-1,1],  must  be  sealed  into  [0,1).  The  simplest  transformation 
is: 


Pij(X|X')  = 


-iij(X,X  )  +  1 


a 


(2.1.3) 


where  Pij(X[  X  )  is  to  be  read  as  the  conditional  probability  that  aj  has  label  X 
given  aj  has  X o  is  a  suitable  constant.  Substituting  Kquation  (2.1.3)  into 
Equation  (2.1.1)  gives: 


p/'^'ix)]!  +  x:<iij:^!^'Pij(x)x')-  i]if(x')} 

p.(lc  +  l)(X)  =  - 1 - - 

EPi‘''(M{l  +  EdijV|aP..(X|  X')  -  l]Pj(')(X')} 

X  j  X 

if'lX))!  +  V‘«ij^:«Pij(X|x')lf  >(X')-  1} 

_  _ i  X _ 

V:p/M(X){I  +  Vd.jVolVj(X|X')lf)(X')-  1} 
X  j  X 

P/'‘’(X){VdijV:al>ij(X|X')P/‘‘)(X')! 

_ _ i  X _ 

V:Pi('''(X){^\lijVaPij(X|x'  )Pj"^>(X')) 

X  j  X 

P/'"(X)|Vdi^Vp..(X|x')lf»(X')S 

_ _ _ _ _ 

X:Pi‘''’(X){Vd;jVp.,(X|X')lf)(X')} 

X  j  X 


Pi‘'^'(X)Qi<'^>(X) 
vif  )(X)Q/'''(X) 


where 


(2.1.1) 


(2.1.5) 


Qili<)(A)  =  Vd,:^MVX|X  )lf)(X  ) 

j.J  X 

Thi.s  i.s  an  analogous  form  of  (lie  u|)<lating  rule  of  liquation  (2.1.1)  with 
compatibility  cooflicicnts  ri'placcd  by  conditional  lik<‘lili<»ods  which  .‘■atisfy 


2.1.3  Problem 

The  problem  we  want  to  deal  with  here  is  how  to  incorporate  information 
from  multiple  ancillary  data  sour<‘('s  into  the  results  of  an  I'xisting  classification 
of  remotely  sensed  data.  ( 'otivenlioiial  classification  of  cover  types  in  remof('ly 
sensed  data,  ba-sed  only  ufton  spectral  information,  might  not  be  enough.  In 
many  cases,  there  are  other  sources  of  practical  data  available  which  can  be 
used  along  with  the  spectral  information  to  improve  the  classification.  For 
example,  various  tree  species  are  known  to  exhibit  growth  patterns  dependent 
upon  topographic  position,  as  shown  in  Fig.  2.1.1,  so  that  tree  species  or  other 
forest  classifications  couhl  be  improved  by  combining  spectral  data  with 
elevation,  slope,  and  f)ther  topographically  r<'lated  information.  As  shown  in 
Fig.  2.1.1,  the  current  pixel  with  remotely  sensed  data  Xo  X3,  x^]  gathered 
from  four  wavelength  bands  from  visible  and  near  infrared  spectrum  has  other 
ancillary  data  yj.  y2,  y^  obtaine<l  from  information  of  elevation,  slope  and 
aspect,  d'he  inff>rmaf ioti  fron?  ancill.ary  rl.ita  soiirct’s  is  .assumed  to  be  available 
in  the  form  of  a  set  of  likelihoods.  The  prixM-dures  proposed  below  are  post- 
cla.ssification  techniques,  in  (hat  the  influence  of  the  available  ancillary  data 
can  be  imposed  on  an  existing,  spectrally  determined,  classification  result. 


(Spectral  Data 
collected  from 
four  wavelength 
bands 


Elevation 

Slope 

Aspect 


Fig.  2.1.1.  I’se  of  Ancillary  Data  [2j 


2.1.4  Approach;  Supervised  Fielaxation  l.abeling 


.\  method  of  supervised  relaxation  labeling  [12.2)  has  been  proposed  for 
incorporating  one  source  of  ancillary  information  into  a  classification  in  a 
quantitative'  manner.  'Phis  method  was  previously  applied  only  to  a  two-class 
(spruee-lir  vs.  others)  ])rol)lem.  'I'lie  su|u>rvised  relaxation  labeling  can  develop 
consistency  between  spi'ctral,  s|>atial.  and  multiple  ancillary  data  sources  of 
information  for  problems  with  multiph'  classc's.  1  he  classification  used  for  this 
purpose  was  produced  from  multispectral  Skylab  imagery.  For  example, 
distribution  of  the  classes  of  tree  sjiecies  with  respect  to  the  elevation  in  one 
specific  area  is  shown  in  Fig.  2. 1..!  I'or  .simj>licity,  tlii'  area  is  assumed  to  be 
labeled  into  three  classes,  .Spruce  Fir,  Douglas  A  While  Fir,  and  Pondero.sa 
Pine.  Put  the  method  can  be  applied  in  a  similar  w.ay  to  problems  involving 
more  chisses.  In  this  area,  there  are  also  data  describing  the  elevation,  slope, 
and  aspect  preferences  of  the  various  tree  species,  along  with  digitized  terrain 
maps  of  elevation,  slope,  and  aspect.  For  the  present  study,  elevation  was 
chosen  as  the  most  important  ancillary  data  variable  for  improving 
cl:issifi<  ation  accuracy  over  an  existing  classification  obtained  from  spectral 
data  alone.  An  elevation  map  of  the  area  is  shown  in  Fig.  2.1.2  [2J  and  the 
distributions  of  three  cla.sses  (assume  their  labels  are  X|,  X2,  and  X^)  with 
respect  to  elevation  are  shown  in  F  ig.  2.1.3.  CJiven  the  elevation  of  the  current 
pixel  known  from  F'ig  2  12  the  probability  of  finding  classes  labeled  X],  Xo, 
and  X3  can  be  seen  in  Fig.  2  1.3.  Let  tlie  likelihoods  be  x,  y,  and  z.  Then  a  set 
of  relativ"  likelihoods  (corresponding  t<i  each  pixel  in  an  image)  for  each  of  the 
labels  for  the  current  pixel  a;  is  as  follows; 


hlfvation  (mrtors) 


Fig.  2.1.3.  Dislrihijlion  Functions  Illustrating  Mic  IJclativc  Likelihood  of 
Finding  Three  Classes  at  X'arious  Altitudes  [I] 
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where 


X 

X  +  y  +  z 


y 

X  +  y  +  z 


7, 

X  +  y  +  z 


j 


These  likelihoods  have  been  derived  from  the  ancillary  data  (i.e.,  elevation) 
apart  from  the  spectral  data  used  in  determining  the  PjlAj’s  in  Equation  (2.1.1) 
and  Equation  (2.1.1). 

If  there  are  multiple  sources  of  ancillary  data  available,  how  can  we 
incorporate  these  multiple  ancillary  sources  into  the  result  of  an  existing 
classification?  Let  us  denote  the  set  of  likelihoods  derived  from  the  first 
ancillary  data  source  for  pixel  a;  as  <l>j'  =  I<^iHX,),  ^j'lXo),  0i'(X3))^  and  that 
derived  from  the  second  ancillary  data  source  for  pixel  aj  as 
‘^r  ~  (<'i"(Xi).  <;^,"(X.j),  0i*(X3)|^.  Then  the  final  .set  of  relative  strengths  (that 
describe  the  relative  likelihood  of  the  labels  for  the  pixel  a;)  derived  from  these 
two  ancillary  data  sources  is  denoted  as  =  [0i(X|),  (^ilXo),  0i(X3)]^  where  4*; 
has  the  form  of  an  elementwi.se  product  of  two  vectors  and  4>j^ 


m 


s  -V.-. 


<I»i  =  ^ilXo) 


^'(Xil^rix, )+<;>, '(X2)<ir(Xo)+0|M3pi2(X3) 

_ «^i'(Xo)<^r(X2) _ 

‘^i'l  X I  )cJi"(  X , )  +  d|'(  Xo)<;i|-(  X., )  +«ii'(X3  )c>i“(  Xj) 

_ cVlX^KylX;,) _ 

'Xi'(X ,  )o,-(X , )  +  cv'(X,.)Of(X2)  +  Oi'  l3l)i->  1 3 


whore 


^‘i>l(Xj)  =  1 
j  =  i 

The  final  sot  of  rolalivo  stroiifftlis  has  hoc'ii  norinalizc'd  If)  make  the  siiiii  of  all 
tho  terms  equal  1.  Of  coiiiso.  this  approach  can  ht*  Roiu-ralizod  to  deal  with 
problems  with  mult  i[)le-<  lass,  multiple-ancillary  data  sources.  Therefore,  the 
modified  supervised  relaxation  procedure  is  adopted  to  incorporate  the  0j(X)'s 
and  thus  allows  multiple  sources  of  ancillary  data  to  bias  the  outcome  of  the 
set  of  likelihoods  for  each  pixel  in  the  image  at  each  iteration.  If  I  Ik*  current 
favored  label  on  the  |)ix('l  is  also  strongly  supported  by  the  amillary  data  [as 
expressed  by  0|(X)],  then  its  likelihood  is  s( reiigl  heiied  |)rior  to  moving  to  the 
next  iteration.  Conversely,  if  the  current  favored  label  is  not  strongly 
supported  by  the  ancillary  data,  then  its  jfrobabilily  is  wi'akened  Ix'fore 
proceeding.  From  the  form  of  <t>i,  it  is  easy  to  realize  the  reason  why  an 
elementwise  product  rule  is  chosen  to  combine  the  sets  of  likelihoods  derived 
from  multiple  ancillary  sources.  Namely,  any  label  (or  class  name)  with  a  very 
small  value  of  probability  derived  from  ancillary  data  source  will  force  that 
label  to  have  only  very  limited  ellects  on  the  next  estimate  for  the  current  pixel 
a|  even  though  other  aticillary  sources  have  evidenct'  to  sup|)ort  this  lalxd.  It 
further  prevents  the  additive  accumulation  of  the  small  responses  from 
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ancillary  data  sources  due  to  some  inaccurate  estimations.  The  strategy  of 
modified  supervised  relaxation  labeling  can  he  implemented  quantitatively  by 
defining 


-  I  +  ./iNo(X)  -  Ij.  Xo\ 


C.’.l.b) 


where 


N  is  the  total  numlier  of  possible  labels  (i.e..  classes) 
d  h  R  weighting  constant  to  be  determined 
Through  the  parameter  /i,  the  pre.sent  method  allows  the  relative  influence  of 
spectral  and  ancillary  data  soiirci's  to  be  varii'd.  A  large  J  reflects  a  liigh 
degree  of  confidence  in  the  accurai  v  of  ancillaiv  data  sources.  .Vt  tin-  k  +  1  Ih 
iteration,  Kquation  (2.1.0)  is  used  to  modify  the  label  weights  according  to 


|,,(k  +  i)(X)+  = 


l>i<'^^')(X)Ci(X) 

Vl»p-^')(X)f,(X) 

X-A 


where  the  denominator  is  a  normali/ing  factor.  If  >’  —  0.  thi'ii  t  |(X)  ~  1  and 
I)p  +  l)(j^)^  -|.,(k  +  i)|X)  whicli  me.ans  the  ancillary  dat.a  hav(>  no  inlliienc(>  at 
all.  If  the  ancillary  data  sourc<'s  ha\<'  pro\  idl'd  no  informatimi  (or  no 
preference  for  any  label  for  the  |)i\el  aj|,  then  (^j(X)  =  N  '  for  X<.\.  This  leads 
to  rl‘‘^')(X)‘"  =  ri"‘'^'>(X)  i.e..  the  ancillary  data  sources  have  no  influence  on 
the  jirogress  of  the  relaxation. 

Snbst  it  Ml  ing  la|iia(ion  (2  i  (j  mio  lo|'i:ilion  (2.1.7).  it  becomes 


|,jk  + 


i*r'(X)t.),'yx) 

\;iV'‘'lx)o.'t'(X) 

X.A 


(2.1.8) 


where 


>  •» 


*  r'i(X)  (2,1.fl) 

QiIa'IX)  is  the  original  iiciglihorhood  contribiilion  of  IC(nin(ion  (2.1.'))  aiigmcnli'd 
by  Equation  (2.1.6). 

The  modified  supervised  relaxation  labeling  .starts  with  I’|*®*(X),  X  f  A 
where  P/®*(X)  is  the  initial  |)robabilit y.  In  principle,  these  likelihoods  are 
available  in  the  penult  iiiiatc'  step  of  the  maximiitii  likelihood  ckissilic  at  ion 
obtained  using  spectral  data  only  (i.e.,  just  prior  to  the  pixel  being  labeled 
according  to  the  larg<'st  i)f  those  likidilioods). 

The  required  context  conditional  likelihoods  P;j(XjX  )’s  are  computed  from 
the  original  classification.  A  more  precise  means  for  obtaining  the 
compatibilities  would  be  to  estimate  them  from  some  other  source  of  data 
known  to  be  correct. 

The  final  labeling  achieved  will  re|)resent  a  balance  in  consistency  Ix'lweeii 
the  spectral  information  itnplicit  in  the  ittitial  likelihoods  P|*®)(X),  Xf.\.  ancillary 
information  embedded  in  the  (^j(X)’s,  and  spatial  context  data  incor|)ornted  in 
the  set  of  Pij(X  |  X  )'  s. 

2.2  Method  of  the  Inic'gration  of  Image  Data  from 
.Multiple  .Ancillary  Data  .Sources 

In  Section  2.1.1,  an  elementwise  product  of  two  vectors  <!>/  and  <f>j'  which 
are  derived  from  two  ancillary  data  .sources  for  a  pixel  a;  is  used  to  produce  the 
final  set  of  relative  strengths.  This  section  will  provide  a  justification  for  using 
the  elementwise  product  of  vectors  to  integrate  ancillary  information  from 
mult  iple  sources. 


0i(Xj),  in  Section  2.1.4,  is  actually  the  probability  of  finding  class  Xj  if  the 
elevation  value  at  pixel  a;  is  given.  Indeed,  (JifX;)  can  be  expre.s.sed  as  P(Xj|  x) 
in  which  index  i  is  omitted  and  x  is  the  elevation  value.  That  is,  given  the 
elevation  value  x,  the  probaliility  of  finding  class  Xj  is  l’(Xj|  x).  Similarly,  given 
another  value  y  such  as  slope  or  aspect  angle,  another  probability  l’(Xj|  y)  can 
be  found.  Then,  the  multiplication  of  these  two  valiii's,  I'lXj  x)r(Xj|  y),  can  be 
expressed  as  follows,  if  it  is  assumed  that  P(x  )P(y)  =  P(x,y)  and 
P(x|Xj)P(y|Xj)=P(x,ylXj): 

_  ('(x,yjXj)P(Xj)P(Xj) 


^  P(x.y,Xj)P(Xj) 

P|x,y) 

=  P(Xj|x,y)P(Xp 

N 


(2.2.1) 


where,  in  the  last  step,  I’fX:)  =  —  and  N  is  the  total  number  of  classes.  This 

j  N 

means  that  each  class  is  assumed  to  be  equally  likely.  After  normalization  as 
in  Section  2.1.4,  the  constant  term,  N.  in  the  above  equation  will  be  eliminated. 
Then,  the  probability  of  finding  class  Xj  given  two  ancillary  data  observations  x 
and  y  can  be  derived  from  the  probability  of  finding  class  Xj  given  x  and  the 
probability  of  finding  class  Xj  given  y.  If  the  independence  assumptions  are  not 
valid,  the  derivation  in  IXpintion  (2.2.1)  is  not  valid  either.  In  this  case, 
P(Xj|  x,y)  can  be  expressed  as 


‘2‘> 


P(Xj|x,v)  = 


l>(x.y|Xj)P(Xj) 

l’(x.y) 


(2.2.2) 


The  joint  (list rihiJtion.s,  l*(.'i,yjXj)  niiisl  he  f.slini;)l ed  first  through  tlic  training 
samples,  and  then  the  probability,  P(Xjjx,y),  can  be  calculated. 

Given  more  than  two  indep<‘nd(‘nt  ancillary  data  sources,  the  probability 
of  finding  class  Xj  can  be  derivc'd  in  exactly  the  same  way  as  in  I‘'(|iialion  (2.2.1) 
except  that  the  constant  t«'rin  will  be  different.  Again,  the  constant  term  will 
be  eliminated  after  normalization. 


2.3  (Convergence  Property 

Section  2.3.6  will  discuss  the  convergence  property  of  the  supervised 
relaxation  operator.  The  preceding  material  will  describe  some  theoretical 
background,  most  closely  following  (13j,  step  by  step  in  order  to  reach  the  final 
conclusion  in  the  last  Section  2.3.6. 


2.3.1  Introduction 

In  a  relaxation  operator,  there  are; 

(1)  a  set  of  objects; 

(2)  a  set  of  labels  for  each  object; 

(3)  a  neighbor  relation  over  the  objects;  and 

(4)  a  constraint  relation  over  labels  at  pairs  of  neighboring  objects 

We  shall  denote  the  objects  by  the  variable  i,  if  A,  which  can  take  on  integer 
values  between  1  and  n  (the  number  of  objects),  the  set  of  labels  attached  to 
node  i  by  Aj,  and  the  individual  label  (elements  of  A|)  by  the  variable  X. 


For  simplicity,  we  assiimc  tlml  the  niiml)er  of  lal)els  at  each  node  is  m, 
independent  of  i,  so  that  the  variable  X  takes  on  integer  values  from  1  to  m. 

Constraints  are  only  defined  over  neighboring  nodes.  The  constraints 
allow  for  labels  to  express  a  preference  or  relative  dislike  for  other  labels  at 
neighboring  nodes  by  use  of  weighted  values  representing  relative  preferences. 
That  is,  the  constraints  are  generalized  to  real-valued  compatibility  function 
'/jj(X,X')  signifying  the  relative  support  for  label  X  at  object  i  that  arises  from 
label  X'  at  object  j.  This  support  can  be  either  positive  or  negative. 
Generally,  positive  values  indicate  that  labels  form  a  locally  consistent  pair, 
whereas  a  negative  value  indicates  an  implied  inconsistency.  When  there  is  no 
interaction  between  labels,  or  when  i  and  j  are  not  neighbors,  7ij{X,X' )  is  zero. 

Having  given  the  compatibility  weights,  continuous  relaxation  also  uses 
weights  for  label  assignments.  W<*  denote  the  weight  with  which  label  X  is 
assigned  to  node  i  by  P,(X),  and  will  require  that 

0  <  Pi(>^)  <  I.  aH  i. 

and 

ni 

V  Pj(X)  ^  1,  all  i  =  I,....n. 

X  =  I 

The  relaxation  process  iteratively  updates  the  weighted  label  assignments  to  be 
more  consistent  with  neighboring  labels,  so  that  the  weights  designate  a  unique 
label  at  each  node.  Some  have  defined  consistency  as  the  stopping  points  of  a 
standard  relaxation  labeling  algorithm. 
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2.3.2  Consistency 

An  unambiguous  labeling  assignment  is  a  mapping  from  the  set  of  objects 
into  the  set  of  nil  labels,  so  (hat  eacli  object  is  a.ssociate(l  with  exactly  one 
label. 

1  if  object  i  maps  to  label  X 

P  IX)  = 

'  10  if  object  i  does  not  map  to  X 

Note  that  for  each  object  i, 

m 

L'  iMX)  =  i. 

X  =  1 

The  variables  Pj(l),  .  .  .  ,Pi(m)  can  be  viewed  as  composing  an  m-vector  I’j, 
and  the  concatenation  of  the  vectors  P|,  Po,  .  .  .  ,  P„  can  be  viewed  as  forming 
an  assignment  vector  P<R"'".  The  space  of  unambiguous  labelings  is  defined 
by 

K*  =  {PtR""':  P=(P„...,PJ; 

Pi  =  (Pi(l),  ....Pi(m))rR"’' 

Pi(X)  =  0  or  1,  all  i,  X; 

m 

V  Pj(X)  =  I  for  i  =  l,...,n} 

X  =  1 

A  weighted  labeling  assignment  can  be  defined  by  replacing  the  condition 
Pi(X)  =  0  or  1  by  the  condition  0<Pj(X)<l  for  all  i  and  X.  The  space  of 
weighted  labeling  iLssignments  is  defined  by 
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K  =  {P€  R"'":  P  =  (P„  •  ■  ,Pn); 

Pi  =  (Pi(l)....,Pi(in))  ^  R"*; 

0  <  Pi(X)  <  1  for  ail !,  X; 

m 

V  Pi(X)  =  1  for  i  = 

X  =  1 

We  note  that  K  is  simply  the  convex  hull  of  K*.  For  example, 

P  =(Pl.P2) 


=  E  P.(e|)P2{l)-(V,)+  E  P,(e,)P2(2)*(ee,.*‘2)  +  E>e3) 

♦,  =  1  »i=i  *1 


=  P,U)P2(l)*(<^j.?l>  +  P,(2)P2(1)*P‘2.<^’i)  +  Pi('‘^)P2n)'(e3<ei) 


+  P,(l)P2(2)-(e„e2)  +  Pi(2)p2(2)-(e2,C2)  +  P,(3)P2(2) -(63,62) 


+  P,(I)P2(3)-(6„63)  +  P,(2)P2(3)- (62,63)  +  P,(3)P2(3)- (63,63) 


(p,(I)I’-(I) 

(P,(1)+p,(2)+p,(;{))P2(i)I1 

P,(2)P2(I) 

1 

0 

|p,(3)P2(1) 

0 

Pl(l)P2(‘-J) 

0 

P,(2)P2(2) 

< 

(P,(l)  +  P,(2)  +  P, (31)1*2(2) 

Pl(3)p2(2) 

0 

J  . 

V-'.  l ' 


T 


P  l(l)P  2^^) 

0 

+ 

P,(2)P2(3) 

. 

0 

P  i(3)P2(3) 

(P,(1)  +  P,(2)  +  P,(3))P2(3)| 

fPid) 

Pod) 

Pi(2) 

P2(2) 

1p,(3) 

P2(3) 

We  can  generalize  this  as  follows:  let  denote  the  standard  unit  m-vector  with 
a  1  in  the  kth  component.  Then  any  labeling  assignment  1’  in  K  can  be 
expressed  by 

^  m  m  m 

^  ^  ^  -  ■  •  •  -  l’,(0,)l‘.>(02)  •  •  ■  Pn|oJ-(e,,,e,^ - e,J  . 

ei=l  ej=l  e„=l 

Since  each  nm-vector  (e^^,  .  .  .  is  in  K*.  the  above  sum  can  be  interpreted 
as  a  convex  combination  of  the  elements  of  K*.  Note  that  the  sum  over  all  of 
the  coeflicients  is  1  i.e., 


«■,  =  !  o„=l 

Remember  that  no  restrictions  are  placed  on  the  magnitudes  of  the 
compatibilities,  '|jj(X,X' )’s,  and  these  will  be  represented  by  a  matrix  of  real 
numbers  -  the  compatibility  tnatrix. 


Delinitioii  2.3.1:  Let  a  matrix  <>f  compatibilily  and  a  labcding  as^ignnKMit 
(unambiguous  or  not)  be  given.  We  deline  the  s\ipport  for  label  X  at  object  i 
for  the  assignment  P  by 


S,(X)  =Si(X;P)  =  V  V  ^,i.(X,X')Pj(V) 
j  =  i  x'=i 

More  generally,  the  support  values  SjlX)  combine  to  give  a  support  vector  S 
that  is  a  function  of  P,  i.e.  S  =S(P). 

nefinitiori  2. •1.2  (l)e(itie  cotisis) encv  for  iitiaml)igiH)iis  labelings): 

Pet  P<K*  be  an  unambiguous  labeling.  Suppose  that  X, . X,,  are  the 

labels  which  are  a.ssigiied  to  objects  I n  by  the  labeling  P.  That  is, 

P  =  (cX|,exj,  ■  ,  e^J.  The  unambiguous  labeling  P  is  consistent  (in  K*) 

providing. 

S,(X,;  P)  >  S,(X;  P),  1  <  X  <  m 

P)  >  ^^V.P).  1  <  X  <  m 

.\t  a  consistent  unambiguous  labeling,  the  support,  ar  each  objc'ct.  for  the 
assigned  label  is  the  maximum  support  at  that  object.  The  condition  for 
consistency  in  K*  can  be  restated  as  follows: 

ni  _  ni  _ 

V  Pi(X)S,(X;P)  >  V  v,(X)S,(X;P),  i  r:  i . n 

X=1  X=l 

for  all  unambiguous  labelings  v  <  K*. 

Delinition  2.d  3  (ConsisteiH  y  for  weigh t<‘(|  labeling  assigiiMK'nIsI: 

Pet  I’cK  be  a  weighted  labeling  assignment.  The  P  is  consistent  (in  K) 


S  Pi(X)Si(X;P)  >  V  Vi(X)Si(X;P),  i  = 

X=1  X=1 

for  all  labelings  vr  K. 

Definition  ‘2.3. {  (Strictly  coiisistent I: 

Let  Pf  K.  Then  I’  is  strictly  consistent  providing 

m  rrt  __ 

V  Pi(X)Si(X;P)  >  V  v.,(X)Si(X;P),  i  =  l,...,n 

X  =  t  <7=1 

for  all  labeling  assignments  vrK,  v /P. 

Theorem  2.3,1:  A  labeling  PcK  is  consistent  if  and  only  if  PrK: 

5]  'TijlX.X'  )Pj(V  )[vi(X-Pi(X)l  <  0  for  all  v  ^  K. 
i,X,j,X' 

Proof:  See  (13). 

Average  local  consistency  is  defined  as: 

A(P)  =  V  V  Pi(X)Si(X) 
i  =  l  X 

The  individual  components  .Sj(X)  depend  on  P  which  varies  during  the 
relaxation  process,  whereas  consistency  occurs  when  ^  Vi(X)S|(X;P)  is 
maximized  by  v  =  P.  That  is  the  S|(X)  sht>uld  be  fixed  during  the 
maximization. 

Remember  that  the  labeling  space  K  is  the  convex  hull  of  the 


unambiguous  labeling  assignment  space-  K*. 


2.3.3  (ic’omof rical  Striictiir(‘  of  .'\s.>iKiitn<'nl  Spacr 

A  simple  examj)l(';  llit'ii-  .art'  two  objects  with  tliree  possible  labels  for  ('acli 
object.  A  labeling  assignment  consists  of  six  nonnegative  numbers; 

P  =  |Pi,P2)  =  (PiM)-Pi(2),P,(3);  P2(1),P2(2).Po(3))  satisfying 


V  l’.(X)  =  1,  for  i  =  1.2 

x  =  i 

The  locus  of  po.ssible  siibveclors  1',  in  R'  is  shown  in  Tig.  2.3.1.  The  vector 
p=(pi^p2)  can  be  regarded  as  two  points,  each  lying  in  a  copy  of  the 
likelihood  space  shown  in  Fig.  2.3.1.  Thus  K  can  be  identified  with  the  set  of 
all  pairs  of  points  in  two  copies  of  the  triangular  space.  This  can  be 
generalized  to  the  case  with  n  objects  each  with  m  labels.  Then  K  is  more 
complicated.  .\  weighted  labeling  assignment  is  a  point  in  the  assignment 
space  K,  and  K  is  iti  turn  the  convex  hull  of  the  set  ol  unambiguous  labeling 
assignments  K*.  .An  unatnbiguous  assignment  is  compo.sed  of  points  which  lie 
at  vertices  of  their  respective  surfaces. 

The  tangent  space  is  a  surfa<-e  which  when  placed  ai  the  given  point  lies 
■tangent”  to  the  entire  surface.  If  F  is  a  labeling  assignment  in  K,  and  v  is 
any  other  assignment  in  K,  the  dilTcrcnce  vector  d=v-F  is  shown  in  Fig. 
2.3,2.  .\s  V  roams  around  K,  the  s(>i  of  all  possibh'  tangent  directions  at  F  is 
swept  out.  'File  set  of  all  tangent  vectors  at  F  is  therefore  given  by 

Tp  =  Id;  d  =  a(v  -  P),  V  r  K,  o  >  0} 

.Any  tangent  vector  is  composed  of  n  subvectors  so  that  d  =  (dj,  .  .  .  ,d„)  and 

?n 

V  d,(X)  =  V  o-(v,(X)-F,(X))  :-o-(l-l)  -0. 


The  set  of  tangent  vectors  at  the  interior  point  P  consists  of  an  entire 
subspace,  which  is  given  by 

Tp  =  {d  =  (d, - dj;  dic  R-",  V  di(X)=0} 

x=i 

(P  interior  to  K) 

Observe  that  Tp  and  K  are  parallel  flat  surfaces. 

When  P  lies  on  a  boundary  of  K,  the  tangent  set  is  a  proper  subset  of  the 
above  space  Tp  w'here  is  any  interior  point.  That  is,  when  the  assignment 

P  has  some  zero  components,  the  set  of  vectors  of  the  form  fv(v-P)  is 
restricted  to 

Tp  =  {d  =  (d„  .  .  .  ,d„):  dj  e  R-",  V  di(X)  =0, 

x=i 

and  di(X)  >  0  if  P,(X)  =  0} 

2.3.4  Maximizing  Average  Local  ('onsislency 

From  Theorem  2.3.1,  maximizing  A(P)  corresponds  to  finding  a  consistent 
labeling.  The  increase  in  A(P)  due  to  a  small  step  of  length  a  in  the  direction 
u  is  approximately  the  directional  derivative: 

A(P  +  au)-A(P)  A(P  +  toil)  =  grad  A(P)*au 

t=0 

where  ||u||  =  1.  The  greatest  increase  in  A(P)  can  be  expected  if  a  step  is  taken 
in  the  tangent  direction  il  which  maximizes  the  directional  derivative. 
However,  if  the  directional  derivative  is  negative  or  zero  for  all  nonzero  tangent 
directions,  then  A(P)  is  a  local  maximum  and  no  step  should  be  taken.  To  find 
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.1  direction  of  steepest  ascc'iit.  grad  A(r)*u  should  be  maximized  among  the  set 
of  tangf'nt  vectors,  llowevt-r,  it  sullices  to  consider  (mly  those  tangent  vectors 
with  Kuclidean  norm  jjo||  =  1  together  with  u  =  0 

Thus  the  direction  of  steepest  ascent  can  be  found  by  solving  the 
following. 

IToblem  1:  Find  UfTp  015,(0)  such  that  u*q  >  v-q  for  all  vtTp  015,(0), 
where  q  =  “  grad  A(P).  Here  15,(0)  =  { v  r  R""’;  |j  v||  <  1 }. 

When  the  maximum  u*q=0,  we  will  agree  that  u=0  is  the  best  solution  to 
problem  1.  Conceptually,  starting  at  an  initial  labeling  P,  we  compute 

q  =  -^gradAlP),  and  solve  problem  I.  If  the  resulting  il  is  nonzero,  we  take  a 

small  step  in  the  direction  ii,  and  repeat  the  process.  The  algorithm  terminates 
when  u  =  0. 

\N  hen  I*  is  the  interior  of  the  assignment  space  K,  solving  problem  1 
corresponds  to  projecting  q  onto  the  tangent  space  Tp,  and  then  normalizing. 

Lemma _ L  If  P  lies  in  the  interior  of  K,  then  the  following  algorithm  solves 

problem  I. 

1  III 

(1)  set  C,  =  b  V  ,p(e),  i  =  I . n. 

(2)  set  \V,(X)  --q,(X)  (■„  all  X. 

(3)  set  u,(X)  =:VV,(X)/j(VV||,  all  i,  X. 

lhre,||u!|:- 


\V,(X)- 


i.x 


!/■: 


Proof: 


Since  ||u||  —  1  and 


X=1  X  =  1  lx=l  j 

from  the  definition  of  tangent  space,  it  is  ohvioiis  that 

u  c  Tp  n  B,(0) 

To  observe  that  w  is  the  projection  of  q  onto  Tp,  we  need  to  prove  that 
(q-w)*v  =0  for  all  vcTp. 

\^N"(qi(X)-Wi(X))*Vi(X)  =  Vc|V=0  since  vrTp 


Thus 


vq  =  vw 


for  all 


v(Tp. 


Since  u  f  T^, 


u*q  =  u-w  =  — ^ ’w  =11^1  >1M1 ‘jlili  for  any  vfTpPlBilO)  (note  that 

||s|| 

||v||  <  1).  Since  ||w||  •||^|  >  w*v,  so  we  have 

u’q>u’q  for  all  V  ( '^I'p  n  B|(()) 


That  is,  u  solves  problem  I. 


Combining  tlu'se  results,  we  obt.ain  the  follouing  algorithm  for  finding  a  local 
maximum  of  .\(P). 


V/.- 


’C‘ 

- :  V 


vv->, 

.  •»'  ‘ 
t  -Si  *  ^  « 


,vV'’.v*N 


IVJi  ■  l^'V 
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Algorithm  2.3.1: 

Initialize; 

(1)  Start  with  an  initial  labeling  assignment  P"rK.  Set  k=0.  Loop 
until  a  stop  is  executed: 

(2)  Compute  <1*^  =  gradA(l*'‘). 

—  —1.  __  r 

(3)  Cse  the  algorithm  in  Lemma  I,  with  P  =  F',  q=q  ,  to  find  the 
solution  u*'  to  prol)lem  I. 

(4)  If  u*'  =  0  stop. 

(5)  Set  P**  ^  *  =  F'‘ +  ku'‘,  where  0  <  h  <  Oj,  is  determined  so  that 
P'''^‘fK  The  maximum  step  size  ftjt  is  .some  predetermined  small 
value,  and  may  decrease  as  k  increases  to  facilitate  convergence. 

(G|  Replace  k  by  k  +  I. 

End  loop. 

In  .summary,  successive  iterates  are  obtained  by  moving  a  small  .step  in  the 
direction  of  the  projection  of  the  gradient  q  onto  the  convex  set  of  tangent 
directions  Tp.  The  algorithm  stops  when  this  projection  is  zero.  \\ C  now  have 
a  method  for  (in<ling  consistent  labeling-',  given  an  initial  labeling  assignment. 
Recall  the  variational  ine(|nalilv  for  eonsislencv  from  theorem  2,3.1. 

)Pj(V)(v.(X)-lMX))  <  0  for  all  v(  K 

i,\  j,X' 


or,  more  generally, 


V  Si(X;P)-(Vi(X)-Pi(X))  <  0  for  all  v  f  K 

i,X 


Hereafter,  we  define  the  components  of  q  by 


^  7ij(X,X')Pj(X')  , 

iX 


that  is,  we  have  set  q  -S(P). 


Observation  2.3.1:  With  q  defined  as  above,  the  variational  inequality  is 


equivalent  to  the  statement 

q  •  t  <  0  for  all  t  (  Tp  . 

That  is,  a  labeling  P  is  consistent  if  and  only  if  q  points  away  frotn  all 
tangent  directions. 

Proof:  We  have  q  =  S,  and  any  tangent  vector  t  at  P  can  be  written  as  a 
positive  scalar  multiple  of  v-P,  where  vfK.  The  observation  follows 
immediately. 

Therefore,  if  at  a  labeling  P,  the  associatt'd  vector  q  points  in  tin*  same 
direction  as  some  tangent  vector,  then  P  is  not  consistent.  So  P  should  be 
moved  in  the  direction  of  that  tangent  vector.  The  process  may  be  repeated 
until  q  evaluated  at  the  current  assignment  points  away  from  all  tangent 
directions.  Then  P  will  be  a  consistent  labeling.  Note  that  q  varies  as  P 
moves,  but  that  generally  q  will  change  .smoothly  and  gradually. 

If  q‘t>0  for  some  tangent  direction  I,  then  the  current  assignment  P  is 
not  consi.sf ent,  and  .should  be  ujxlated.  It  makes  sense*  to  move  P  in  the 
direction  u  that  maximizes  q'u.  Therefore  the  relaxation  labeling  algorithm  is 
given  by  the  following. 


V 


l.'  -.-  %' 


•••  -  •  s 

•. ,-.V,  j 


•  •  A  *  , 
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Algorithm  2.3.2:  Replace  step  2  in  Algorithm  2.3.1  with; 
(2' )  Compute  <i  =  S(P).  That  is; 

j.v 

All  other  steps  remain  the  same. 


Propc^sition  2.3.3;  Suppose  P  is  a  stopping  point  of  Algorithm  2.3.2.  Then  P  is 
consistent. 

Proof;  A  point  P  is  a  stopping  point  of  Algorithm  2.3.2  if  and  only  if  u=0 
solves  problem  1.  If  u=0,  then  v*(i<u*q=0’q=0  for  all  tangent  vectors 
vf  Tp.  On  the  other  hand,  if  t‘q<0  for  all  tc Tp,  then  u  =  0  maximizes  u*q 
for  ufTpOBjIO).  According  to  Observation  2.3.1,  t*q<0  for  all  tcTp  is 
equivalent  to  the  variational  inequality,  which  is  in  turn  equivalent  to  P  being 
consistent  (Theorem  2.3.1). 

At  this  point,  we  have  presented  the  relaxation  labeling  algorithm  in  such 
a  way  that  the  stopping  points  of  the  algorithm  are  consistent  labelings. 

Recall  that  a  labeling  is  strictly  consistent  if 

i:Pi(X)Si(A)  >  V  Vj|A).S;(X),  i  =  I . n 

X  X 

whenever  v  /  P,  vrK.  As  a  result,  the  variational  inequality  can  be  replaced 


by  the  statement 


v;  ^ijix.x'iPjlx'Mviixi-Piixixo 


C-’.'.VW 

'J 

.V  a' 
.  • .  • .  H 


Vv-C>. 

V  V  •j' 

vvV 


.  -.V. 


for  nil  V  r  K,  v  /  I’ 


for  a  strictly  consistent  labeling.  In  particular,  q*u  <0  for  all  nonzero  tangent 
directions  u  at  a  strictly  consistent  labeling  I*.  We  claim  that  |*f  K*  (i.e..  that 
P  is  an  unambiguous  labeling).  Supi)ose,  for  contradiction,  that  0<  Pj,,(X„)  <  I 
for  some  (i,^,  X„).  Then  for  some  other  X„.  0<Pj„(X„)<  I.  We  consider  two 
tangent  directions, 

jo  ,  /  i„ 

=  |(0,...,0,1,...,-1,...,0),  =  i„, 

and  U.2  =  -  Uj  . 

That  is,  U|  has  a  I  in  the  (i^.X^)  position  and  a -I  in  the  (i,„X„)  position,  and  il.j 
is  the  other  way  around.  The.se  are  valid  tangent  directions  according  to  th<> 
formulation  of  Tp.  However,  <i*U|  so  they  cannot  both  be  negative. 

Hence,  we  have  shown  that  a  strictly  consistent  labeling  P  must  be 
unambiguous.  Thus  if  q  points  away  from  the  surface  K  at  a  vertex  (i.e..  an 
unambiguoirs  consistent  labeling),  then  q  will  point  generally  toward  the  vertex 
at  nearby  a.ssignments  in  K.  .Accordingly,  if  P  is  near  the  unambiguous 
consistent  labeling,  moving  P  in  a  tangent  direction  u  that  points  in  the  same 
direction  as  q,  should  cause  P  to  converge  to  the  vertex. 

2.3.5  Relaxation  Operator 

Algorithm  2.3.2  updates  weighted  labeling  assignments  by  computing  an 
intermediate  vector  q,  where 

qi(M  =  X:  ^  T,i(x,x'  iPjix' ) 

j  X' 


and  then  updating  P  in  the  direction  defined  by  the  projection  of  q  onto  Tp. 
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As  we  shall  show,  the  original  updating  formula  [1]  has  Iho  intermediate  veetor 
q  defined  by 

^  > 

j  X' 

In  Algorithm  2.3.2,  we  set  q|(X)  =Si(X).  Here  the  support  vector  S  is  a  function 
S(P),  where  S;(X)  is  computed  from  a  nonlinear  function  of  current  assignment 
values  in  P.  Presumably,  Si(X)  depends  on  the  components  of  Pj  for  objects  j 
near  object  i,  and  is  relatively  independent  of  the  values  of  Pj  for  objects 
distant  from  i.  Therefore,  the  dilferent  object  j  near  olqeef  i  is  given  an 
influence  on  the  support  function  q;(X)  which  depends  on  the  value  of  weighting 
constant  djj  and  has  no  influence  on  q|(X)  at  all  if  djj  is  zero. 

The  principle  difference  lies  in  the  manner  in  which  q  is  projected  onto  a 
tangent  vector.  In  Algorithm  2.3.2,  the  tangent  direction  was  obtained  by 
maximizing  u*q  among  if  c  Tp fl  Hi(0)- 

One  of  the  standard  relaxation  formuhis  was  suggested  by  Rosenfeld  et  al. 
[3]  and  is  given  by 

'  m 

v;  Pi(e)|l  +  qi(e)| 

e  =  l 

(It  is  a.s.sumed,  when  using  this  formula,  that  the  7ij(X,X’ )  values  are  sufficiently 
small  so  that  one  can  be  sure  that  jqi(X)|  <  I.)  There  is  another  similar 
formula,  actually  derived  front  this  one  by  transforming  compatibility 
coefficient  7jj(X,X'  )  to  conditional  probability  Pij(X|  X' ).  Therefore  the  proof  of 
convergence  for  one  is  automatically  valid  for  the  other. 


To  consider  the  behavior  of  this  standard  formula,  first  assume  that  I’  is 
near  the  center  of  the  assignm(*nt  space,  so  that  v('ry  approximately 
Pi(X)~l/m  for  aii  i,  X.  The  updating  can  then  he  regarded  as  consisting  of 
two  steps.  First,  the  vector  P  is  changed  into  an  intermediate  P,  where 

Pi(X)  =P,(X)(I  +qi(X)l^IMX)  +qi(X)/m. 

Next,  P  is  normalized  using  a  scalar  cfmstant  for  each  ohjeci  Pj.  When  I’  is 
near  the  center  of  K,  this  rescaling  process  shifts  P  in  a  direction  essentially 
perpendicular  to  K.  That  is,  1*  is  reset  to  approximately  the  projection  of  P 
onto  K.  Denoting  the  orthogonal  projection  operator  by  we  have 

P  =  P'  ^  Ok(P)  Ok(P  +  q/m) 

by  virtue  of  the  continuity  of  0|j.  Further,  assuming  that  P  is  in  the  interior 
of  K,  and  q  is  sufficiently  small,  then 

0|,(P  +  q/m)  =  F  +  -^  OT(q) 

m 

where  Oj  is  the  orthogonal  projection  onto  the  linear  subspace  Tp.  However, 
the  solution  u  to  problem  1  is  obtained  by  normalizing  Oj(q).  Combining,  we 
have  that 

P  ~  P'  +  a  u 

for  some  scalar  a.  Thus,  P  is  reset  to  a  vector  which  is  approximately  the 
updated  vector  that  one  would  obtain  by  Algorithm  2.3.2. 

When  P  is  close  to  an  edge  or  corner,  the  situation  is  somewhat  more 
con>plicated.  The  first  step  in  standard  updating  (i.e., 
Pi(X)  =Pi(X)[l  +  qi(X)]  =Pi(X)  +  Pi(X)qi(X))  can  be  viewed  as  an  initial 
operation  changing  q,  since  the  components  of  q  corresponding  to  small 


components  of  P  have  minimal  effect  (i.e.,  the  motions  in  directions 
perpendicular  to  the  nearby  edges  are  scaled  down).  The  normalization  step  is 
the  same  as  before.  Therefore,  the  formula  results  in  attenuation  of  motion 
perpendicular  to  an  edge.  Further,  a  zero  component  can  never  become 
nonzero  even  if  the  evidence  supports  the  value. 

2.3.6  Supervised  Relaxation  Operator 

Now  we  are  prepared  to  establish  the  convergence  property  of  the 
supervised  relaxation  operator. 

The  original  relaxation  operator  [3]  indicates  that  Pi(X)  is  updated  by 
Pi(X)[l  +  qi(X)]  in  which  q|(X)  is  the  neighborhood  function.  Actually, 
Pi(X)  [1  +  qi(X)j  =P|(X)  +  Pi(X)qi(X).  These  m  likelihoods.  Pi(X),  X  = 
form  a  m-vector,  Pj,  for  the  current  object  i.  Similarly,  Pi(X)qi(X),  X  =  l,...,m, 
form  another  m-vector,  P;qj.  The  formula,  Pi(X)qi(X),  implies  that  the 
neighborhood  contribution  to  the  class  X  at  current  object  i  should  remain 
relatively  unchanged  if  the  class  X  at  current  object  i  has  likelihood  close  to  1; 
otherwise  it  should  decrease.  The  likelihood  spare  of  Pj  and  the  vector  Pjqj  are 
shown  in  Fig.  2.3.3,  in  which  Pjqj  influences  the  movement  direction  of  Pj. 
After  normalization,  the  summation  of  the  components  in  the  newly  updated 
vector  Pj  is  equal  to  I.  That  means  the  vector  Pj  +  Pjq;  is  rescaled  to  make 

—  f 

the  new  vector  Pj  still  in  the  likelihood  space. 

In  the  supervised  rola\atk)n  algorithm,  before  t’|,  defined  later,  influences 
the  movement  direction  of  Pj,  the  Pj(X)  is  influenced  by  QjjX)  using  the  formula 
Pj(X)(^i(X)  in  which  Qi(X)  is  also  a  neighborhood  contribution  but  the 
compatability  coefficient  ijjlX.X'  )  is  expressed  in  terms  of  the  conditional 


probability  Pij(Xl  X' ).  This  can  be  seen  in  Section  2.1.  Qi(X),  X  =  forms 

a  m-vector  Q;.  For  the  supervised  relaxation  operator,  the  vector  P;  is  then 
modified  based  on  the  probability  derived  from  ancillary  data  sources  using  the 
formula  Pi^(X)  =  P- {X)V'i(X).  Again,  ^f'i(X),  X  =  I,...,m,  forms  a  m-vector 
The  vector  for  every  object  i,  i  =  l....,n,  is  fixed  and  doesn't  change  its 
component  values  while  the  supervised  relaxation  algorithm  proceeds.  But  the 
vector  Vi  at  an  object  i  and  V’j  at  an  object  j  are  likely  to  be  different.  That  is, 
every  object  i  has  different  vector  Vv  These  three  vectors,  Pi,Qi,  and  V,,  are 
shown  in  P'ig.  2.3.4  in  which  vectors  Qj  and  Vj  compete  with  each  other  to 
influence  the  movement  direction  of  I’j.  Of  course,  Pj  has  its  preference  to  one 
of  labels  a.ssigned  to  object  i  llirougli  the  initial  likelihoods  P/®).  For  example, 
if  Vi  and  Q;  don’t  have  enough  influence  to  force  Pj  move  away  from  the  vectex 
(1,0,0),  in  which  X|  is  favored  by  P/“',  shown  in  Fig.  2.3  4,  the  final  labeling  of 
Pj  will  move  toward  and  stay  at  this  vertex  (1,0,0);  otherwi.se  it  will  move  to 
one  of  the  other  two  vertices.  From  theorem  0,1  in  [13],  Pj  will  reach  an 
unambiguous  labeling  assignment  |i.c..  moves  to  one  of  the  three  vertices)  if  Pj 
approaches  sullicienlly  close  to  any  on<'  of  these  three  vertices. 

2.4  Simulation  and  l^xperimental  Results 

The  supervised  relaxation  algorithm  was  programmed  and  aj)plied  to  the 
analysis  of  a  set  of  Fandsat  mult  i''p(<ii:i|  data,  I'he  data  were  co||(>cf('d  by  I  In* 
satellite  over  the  Sa!i  .luaii  Mountain>  in  S\\  Color.ido  [1],  The  objective  of 
the  analysis  was  to  discriminate  among  the  ground  cover  classi's  such  as  ■oak", 
“ponderosa  i)ine”,  "aspen",  "pasture",  "douglas  and  white  fir",  "snow", 
■‘water’’,  and  "other”,  where  the  last  category  was  simply  a  catchall.  Each 
class  was  actually  decomposed  in  the  analysis  process  of  clustering  and  merging 


into  a  union  of  subclasses,  each  having  a  data  distribution  describable  as 
approximately  multivariate  normal. 

2.4.1  Results  from  the  Maximum  Likelihood  Classification  Algorithm 

To  provide  a  baseline  for  comparison,  the  data  fiom  the  first  and  the 
second  channels,  wliicli  were  in  the  range  of  visible  wavehuigths,  were  first 
analyzed  using  the  maximum  likelihood  classification  algorithm.  The  a  priori 
likelihoods  of  the  classes  were  approximated  as  being  equal,  and  2122  test 
.samples,  independent  of  the  training  samples,  were  used  to  evaluate  the  results. 
Due  to  dilTerent  numbers  of  test  samples  in  each  class  being  used  in  the 
evaluation  of  the  algorithm  performance,  the  measure  called  average 
performance  by  class  is  used  to  avftid  any  bias  toward  any  class  which  has  the 
largest  number  of  test  samples.  As  shown  in  Table  2.4.1,  the  average 
performance  by  class  of  this  conventional  maximum  likelihood  classifier  was 
38.5  percent  correct. 

2.4.2  Results  from  the  Relaxation  Operator 

To  implement  the  relaxation  analysis,  the  most  formidable  job  is 
estimating  both  the  initial  likelihoods  and  the  conditional 

likelihoods  P,j(X[X').  The  initial  likelihoods  are  available  in  the  penultimate 
step  of  the  maximum  likelihood  classification  using  2-channel  spectral  data  only 
(i.e.,  just  prior  to  the  pixel  being  labeled  according  to  the  largest  of  these 
likelihoods).  'I'he  reijuired  context  conditional  likelihoods  Pij(X|X')  were 
estimated  from  the  linal  classification  results  [irodiiced  from  the  maximum 
likelihood  cla.ssificat ion  algorithm  by  counting  joint  and  individual  occurrences 
of  the  classes.  However,  rather  than  computing  four  different  sets  of  these 


Table  2.4.1.  Test  Results  for  Classification  of  the  V^allecito  (Quadrangle 
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corresponding  to  eacii  diircrml  neighbor  type  (left,  right,  above,  and  below),  a 
single  sot  was  calculated  by  counting  joint  occurrences  in  both  directions  both 
vertically  and  horizontally. 

The  same  test  samples  were  used  to  evaluate  the  results.  The  results  of 
this  rela.vation  operator  at  iteration  number  r>,  10,  and  *20  are  shown  in  Table 
2.4.2.  The  final  result  was  slightly  better  than  llx'  maximum  likelihood 
classification.  The  average  performance  by  class  was  10.3  percent  correct. 

2.4.3  Results  from  the  .Supervised  Relaxation  Operator  with  One  Ancillary 
Information  Source 

Due  to  the  non.ivailability  of  multiple  ancillary  information  sources  and 
the  desire  to  demonstrate  the  feasibility  of  the  algorithm,  channel  3  data  was 
used  as  ancillary  data  in  the  experiment.  The  supervised  relaxation  operator  is 
now  shown,  by  example,  to  be  a  useful  tool  for  incorporating  information  from 
one  ancillary  data  source,  channel  3  data,  into  an  existing  classification 
produced  from  the  maximum  likelihood  algorithm  using  2-channel  spectral 
data.  Channel  3  is  an  infrare<|  s|)ectral  band.  The  mean  and  standard 
deviation  of  each  class  having  a  data  distribution  describable  as  approximately 
normal  were  estimated  for  channel  3  using  clustering  and  merge-statistics 
algorithm.  From  these  data,  sets  of  for  each  pixel  in  the  image  were 

generated  and  incorporated  in  the  supervised  relaxation  algorithm.  The  same 
initial  likelihoods  and  conditional  likelihoods  were  again  used.  Several 
relaxation  tests  were  performed  using  differing  degrees  of  supervision,  i.e., 
various  weighting  constants  given  to  the  influence  of  the  ancillary  data  via  the 
parameter  /C  The  value  of  which  produced  the  best  results  was  chosen. 


Table  2.4.2.  Test  Results  of  Relaxation  Operator  at  Iteration  iVniTiber  .'j,  10. 
and  20 
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Again  tho  same  test  samples  were  used  lo  evaluate  llie  results.  As  shown  in 
Table  2.4.3,  the  average  performance  by  class  was  better  than  the  ordinary 
relaxation  analysis.  The  final  result  is  64.1  percent  correct.  In  addition,  a 
closer  look  at  the  class-by-class  results  reveals  that  the  performance  for  each 
class  was  better  than  those  attained  using  the  relaxation  operator  without 
ancillary  information. 

2.4.4  Results  from  the  Supervised  Relaxation  Operator  with  Two  Ancillary 
Information  Sources 

In  this  section,  the  supervised  relaxation  algorithna  is  shown  to  be  an 
effective  technique  for  incorporating  information  from  two  ancillary  data 
sources,  channel  3  data  and  elevation  data,  into  an  existing  classification  to  see 
any  improvement  in  classification  accuracy  over  that  obtained  with  only  one 
ancillary  source.  Figure  2.1.4  shows  the  distribution  of  tree  species  as  a 
function  of  elevation  for  an  area  northeast  of  the  Vallecito  Reservoir  in  the 
Colorado  Rockies.  Fig.  2.1.2  shows  a  digitized  terrain  map  for  the  area  covered 
by  the  multispectral  scanner  data  described  earlier.  From  these  data,  the 
second  set  of  </)i(X)  for  each  pixel  in  (he  image  was  generated  and  used  along 
with  those  generated  from  the  first  set  of  in  the  supervised  relaxation 

algorithm.  The  same  initial  likelihoods,  conditional  likelihoods,  and  test 
samples  were  used.  The  weight  constant  producing  the  best  results  was  chosen. 
As  shown  in  Table  2.4.4,  the  average  performance  by  class  using  two  ancillary 
data  sources  of  information  gave  (he  be.s(  result,  80.8  percent  accuracy.  The 
results  from  (he  maximum  likelihood  algorithm,  the  relaxation  algorithm  and 
the  supervised  relaxation  algorithm  are  compared  and  drawn  in  Fig.  2.4.1. 
From  this  figure,  it  is  clear  that  the  more  information  we  use,  the  more 


Table  2.4.3.  Test  Results  of  the  Supervised  Relaxation  Alg(»rilhm  with  One 
Ancillary  Information  Source  at  Iteration  Number  5,  10,  and  20 
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Table  2.4.4.  Test  Results  of  the  Supervised  Relaxation  Algorithm  with  Two 
Ancillary  Information  Sources  at  Iteration  Number  5,  10,  and  20 
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accuracy  in  classification  can  be  achieved.  The  probabilistic  relaxation 
methods  provides  an  effective  approach  for  integrating  information  from  diverse 
sources  of  image  data. 

For  the  prcsnnt  study,  there  is  no  specific  way  to  determine  the  value  of 
weighting  constant  in  F(|nalion  (2. In  any  specific  image  to  be  classified, 
the  significance  of  the  ancillary  data  will  depend  on  its  relevance  and  accuracy. 
Consequently,  the  optimum  degree  of  supervision  must  be  estimated  using 
training  data  just  as  training  data  are  used  in  establishing  classifier  parameters. 

The  algorithm  is  terminated  after  the  fixed  points  have  been  reached,  that 
is,  when  the  likelihoods  assigned  to  each  class  at  every  pixel  do  not  change 
when  the  algorithm  moves  from  current  iteration  to  next  iteration.  I'or  the 
current  study,  after  20  iterations  the  algorithm  had  reached  the  fixed  points. 
The  final  labeling  represents  a  balance  in  consistency  between  spectral 
information  in  the  initial  likelihoods  spatial  context  data  incorporated 

in  Pij(X|  X' ),  and  ancillary  information  embedded  in  the  t/’i(X). 

2.5  Sumniary  and  Conclusions 

The  relaxation  operator  has  been  adopted  as  a  mechanism  for 
incorporating  contextual  information  into  an  existing  classification  result, 
flased  on  the  formula  derived  in  ICqu.ation  (2.1.4),  the  supervised  relaxation 
operator  carriers  on  further  and  incorporates  information  from  multiple 
ancillary  data  sources  into  the  results  of  an  existing  classification,  i.e.,  to 
integrate  information  from  the  various  source.s,  reaching  a  balance  in 
consistency  between  spectral,  spatial,  and  ancillary  data  sources  of  information. 


In  Section  2.1,  the  supervised  relaxation  algorithm  was  derived  from  the 
standard  relaxation  formula  to  incorporate  ancillary  information  by  adjusting 
the  neighborhood  contribution  to  contain  the  influences  from  both  local  contc'xt 
and  ancillary  information.  'I'liis  means  that  the  moving  direction  of  the  initial 
likelihoods  is  influenced  now  by  both  local  context  and  aticillary  itifortnal ion  as 
shown  in  Fig.  2.3.4.  The  proof  that  Algorithm  2.3,2  stops  at  consistent 
labelings  can  be  used  for  the  supervised  relaxation  algorithm  by  generalizing 
the  support  (or  neighborhood  function)  not  only  from  local  contextual 
information  but  also  from  any  other  information,  which  is  useful  to  improve 
the  cla.ssific.ition  accuracy,  such  as  ancillary  information  described  in  Section 
2.4.  Thus,  the  final  labeling  results  from  convergence  of  evidence,  r(>aching 
consistent  labelings,  i.e.,  integrates  information  from  the  various  sources, 
achieving  a  ground  cover  classification  which  is  both  accurate  and  consistent  in 
the  face  of  inconsistencies  which  may  exist  among  the  data  components. 

Section  2.4  showed  experimental  results  of  the  sujn'rvised  relaxation 
algorithm.  With  the  contextual  information  incorporated  in  the  relaxation 
algorithm,  the  performance  was  slightly  better  than  that  obtained  from  the 
maximum  likelihood  classifier.  By  incorporating  one  ancillary  information 
source,  channel  3  data,  using  supervised  relaxation  algorithm,  the  performance 
was  much  better  than  previously  obtained.  For  the  area  classilied,  there  were 
data  available  describing  the  elevation  preferences  of  the  various  fris'  s|)('cies. 
along  with  a  digitized  elevation  m.ip.  By  incorporating  both  (devation  dal.i  and 
channel  3  data  in  the  supervised  relaxation  algorithm,  significantly  better 


It  will  often  be  the  ease  that  one  has  only  a  classification  map  to  work 
with  rather  than  sets  of  likelihoods  generated  by  a  classification  algorithm. 
Under  this  circumstance,  arbitrary  “probability”  values  consistent  with  the 
classification  can  be  assigned  to  cla.s.ses  at  every  object  as  the  initial  likelihoods. 
Of  course,  the  label  for  a  particular  object  indicated  by  the  classifier  as  most 
likely  must  be  assigned  the  largest  probability  value.  But  for  the  present 
study,  the  initial  likelihoods,  were  available  in  the  penultimate 

step  of  the  maximum  likelihood  classification  using  spectral  data  only. 
Therefore,  classification  results  with  higher  accuracy  may  be  expected  since 
pixels  or  objects  with  only  marginal  likelihoods  from  the  maximum  likelihood 
classifier  may  have  their  cla.sses  or  labels  changed  early  in  the  process  rather 
than  having  them  fixed  erroneously  as  a  result  of  the  initial  likelihoods 
assigned.  As  seen  from  the  classification  results,  the  distributions  of  various 
classes  tend  to  be  more  homogeneous  than  that  before  relaxation.  This  is  the 
characteristic  of  the  relaxation  operator  showing  that  the  spatial  context 
information  has  been  used. 

At  the  moment,  (her(*  is  no  specific  way  to  (l<'t('rmine  the  value  of 
weighting  constant  (i,  in  l'’(|uation  (2.1.0),  through  which  the  supervised 
relaxation  algorithm  allows  the  relative  influence  of  spectral  and  ancillary  data 
sources  to  be  varied.  In  any  particular  image  to  be  cla.ssified,  the  significance 
of  the  ancillary  data  will  depend  on  both  its  relevance  and  accuracy. 
ConseqiK'iitly,  the  oi>limum  degree  of  supervision,  i.e.,  weighting  to  be  given  to 


the  influence  of  the  ancillary  data  via  the  parameter  /?,  must  be  estimated 
using  training  data  just  as  training  data  are  used  in  establishing  classifier 
parameters.  Obviously,  this  involves  additional  analysis  cost. 

The  contributions  from  the  four  nearest  neighbors  have  been  used  in  the 
supervised  relaxation  algorithm  for  the  present  study  to  incorporate  the 
contextual  information  into  an  existing  classification.  For  the  simulation 
presented  in  Section  2.4,  equal  weighting  constants,  djj,  have  been  assigned  to 
the  four  nearest  neighbors;  i.e.,  these  four  neighboring  pixels  have  equal  degree 
of  influence  in  the  neighborhood  contribution.  If  the  weighting  constant,  d|j, 
can  be  dynamically  adjusted  to  allow  different  neighbors  to  have  different 
degrees  of  influence  on  the  current  pixel  classification,  the  clas.sifieation 
accuracy  expected  may  be  better  than  that  with  fixed  weighting  constants. 
Clearly,  there  is  a  tradeoff  between  the  better  classification  accuracy  and  the 
cost  involved  in  dynamically  adjusting  weighting  constants. 

In  the  integration  of  multiple  ancillary  data  sources,  the  elementwise 
product  of  sets  of  likelihoods  derived  from  distinctive  ancillary  data  sources  is 
used.  This  means  that  each  distinctive  ancillary  data  source  is  given  equal 
degree  of  confidence.  In  a  more  complicated  case  such  as  one  ancillary  data 
source  being  more  reliable  than  the  other,  it  may  be  desirable  to  assign  two 
different  weighting  constants  to  the  different  data  sources. 

To  use  the  supervised  relaxation  as  a  post  cl.assification  technicpie.  the  set 
of  context  conditional  likelihoods  r-|XjX')  must  be  determined.  The  re(|uired 
context  conditional  likelihoods,  I’jjfXjX'),  eaii  be  estimated  from  the  results  of 
the  maximum  likelihood  classifier  if  no  other  spatial  moch'l  known  to  be  correct 
for  the  image  under  consideration  is  available.  Obviously  the  results  will  suffer 
some  inaccuracy  because  the  conditional  likelihoods  are  estimated  from  the 
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results  which  are  not  perfectly  correct.  For  (he  simulation  results  pre.sented 
here,  the  contextual  conditional  likelihoods  were  estimated  from  another  source 
of  data  known  to  be  correct.  When  (he  number  of  possible  classes  involved  is 
large,  the  number  of  PjjlX)  X' )  values  is  also  large.  Based  on  the  work  in  (20), 
it  is  suggested  that  highly  accurate  compatibilities  are  not  required.  Therefore, 
reasonable  values  can  be  estimated  from  the  results  of  the  maximum  likelihood 
classifier  or  from  some  foreknowledge  of  the  .spatial  characteristics  of  the  image 
data  if  the  initial  likelihoods  in  the  penultimate  step  of  the  classification 
algorithm  are  not  available. 


CHAPTER  3  -  PARALLEL  PROCESSING 


This  chapter  will  describe  how  an  optimal  system  configuration  can  be 
determined  based  on  the  perf«)rmanc(*  criteria  and  parallel  architecture 
described  ir.  this  chapter.  An  SIMI)  (Singh'-lnstniction-stream,  Miilti|)le-I')ata- 
stream)  machine,  as  shown  in  Fig.  3.1,  consists  of  a  control  unit,  N  processors, 
N  memory  modules,  and  an  interconnection  network.  A  proce.s.sor  and  a 
memory  module  forms  a  processing  element  (PE).  The  control  unit  broadcasts 
instructions  to  all  active  PEs  and  all  active  PEs  execute  the  same  instruction, 
in  lock-step,  on  the  data  in  their  own  memories.  The  interconnection  network 
provides  a  communication  facility  for  the  PEs.  .-\n  MIMl)  (Multiph*- 
Instruction-stream,  Multiple-Data-stream)  machine,  as  shown  in  Fig.  3.2., 
consists  of  a  coordinator,  N  PEs,  and  an  interconnection  network.  Every  PE 
fetches  instructions  from  its  own  memory  and  executes  them  on  the  data  in  its 
own  memory.  Every  PE  executes  the  instructions  inde|)endently  and 
asynchronously.  The  interconnection  network  providts  a  communication 
medium  for  the  PEs.  The  coordinator  orchestrates  the  activities  of  the  Pl]s. 

Section  3.1  will  present  the  application  of  S-Nets  [I  LIT)]  to  describing  an 
SIMD  and  pipeline  implementation  of  maximum  likelihood  cla.ssilication,  and 
several  performance  measures  to  evaluate  the  inherent  parallelism  in  this 
algorithm  will  be  discussed  in  this  section.  In  Section  3.2,  for  the  algorithm  in 
block  C  shown  in  Fig.  3.2.1  the  algf)rilhm  execution  times  l),is(>d  on  the 
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counting  of  explicit  and  implicit  operations  in  both  SIMD  and  MIMD  mode  of 
parallelism  will  be  compared  and  discussed.  In  Section  3.3,  several  performance 
measures  different  from  those  in  Section  3.1  for  SIMD  will  be  discussed.  Most 
important  is  the  determination  of  the  optimal  number  of  processing  elements 
for  a  given  image  size  by  considering  the  tradeoff  between  the  cost  of  execution 
time  and  processing  elemei\ls.  Section  3.4  will  describe  how  two  multistage 
networks  [10,17,18,19],  cube  network  and  network,  can  be  configured  to 

support  simultaneously  both  MlMl)  and  SIMD  modes  of  parallelism. 

3.1  Modeling  Image  (Massification  by  Means  of  S-Nets 
Synchronous  Nets  [14,15],  hereafter  S-Nets,  are  an  extension  of  Petri  nets 
and  were  developed  especially  for  the  description  of  SIMD  processes.  This 
section  presents  the  .ajiplication  of  S-Nets  to  describing  maximum  likelihood 
classification,  which  is  commonly  used  for  classifying  remote  sensing  image 
data.  This  application  is  fairly  typical  of  image  processing  operations  which 
are  not  window-type  operations,  i.e.,  they  depend  on  the  data  at  a  single  pixel 
rather  than  a  neighborhood.  In  general,  the  higher  the  dimensionality  of  the 
remote  sensing  (multispectral)  data  and  the  more  classes  represented  in  the 
image,  the  greater  the  potential  benefits  to  be  derived  from  SIMD 
implementation  of  the  process.  This  section  begins  with  an  introductory 
overview  of  S-Nets. 
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3.1.1  S-Net  Structure:  Overview  [U] 

S-Nets  are  defined  in  terms  of  sHs.  The  elomenls  of  a  set  are  designated 
within  {  }.  The  notion  of  tuples  is  denoted  by  <  >;  a  tuple  consists  of 
ordered  components.  .An  S-Net  graph  is  a  quadruple  (T.S.r,.A).  with  an  initial 
marking  Ko  and  a  set  of  transition  descriptors  1).  where: 

T  =  A  finite  set  of  transitions  {t,,to,  .  .  .1|T| } 

S  =  A  finite  set  of  scalar  places  {sj.so,  .  .  .  ,sjs|  }• 

U  =  A  finite  set  of  vector  mask  places 

{<V„M,>,<V2,M2>.-<V)uiAV’P^' 

A  =  A  finite  set  of  directed  arcs  {a,,ao,...,a|  a|  )■ 


Additional  symbols  utilized  in  constructing  S-Net  graphs  are  shown  in  Table 
3.1.1. 

A  marking  associates  a  non-negative  integi'r  with  I'acli  scalar  place  -  K(s) 
for  each  s  e  S;  and  two  vectors  of  non-negative  integers  with  each  vector-mask 
place,  one  of  these  vectors  being  a  boolean  vector  K(Vi),  for  each  V;,  and  K(Mi) 
for  each  Mi.  An  initial  marking  Ko  is  defined  as  the  first  marking  of  the  S-Net. 
The  set  of  possible  mask  markings  for  any  Mj  is  W(Mi).  The  marking 
<1,1, 0.0, 0>  of  Mi  shall  be  denoted  as  <l2,0='>.  In  S-Net  graphs,  markings 
are  indicated  by  the  presence  of  tokens.  Dots  in  any  place  represent  tokens. 
The  symbol  1  shows  the  presence  of  a  token  in  a  mask.  An  assignment  of 
tokens  to  a  vector  place  V;  may  leave  some  of  the  component  places  marked 
with  tokens  and  others  empty.  The  dynamic  behavior  of  S-Nets  is  described  as 


follows. 
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A  scalar  place  is  holding  if  it  has  at  least  one  token  in  it.  A  vector-mask 
place  <Vi,Mi>  is  holding  if  at  least  one  K(mjj)  =  1,  j  =  M,]  .  and  the 

corresponding  Vjj  e  Vj  has  a  non-zero  marking.  A  Iransilion  t  is  enabled  if  all 
scalar  places  of  transition  t  are  holding  and  all  vec(or-m:isk  places  of  transition 
t  are  holding.  When  a  transition  t  is  enabled,  its  firing  function  is  defined  at  a 
given  marking  Kn  of  the  S-Net,  and  the  firing  yields  Kn  +  i,  a  new  marking.  A 
transition  type  specifies  the  firing  capabilities  of  the  transition  -  either  simple 
or  mask  firing  -  designated  SFT  and  MFT  respectively.  A  transition  descriptor 
D[t]  for  a  transition  t  with  vector-mask  output  places 
<Vi,Mi>,  <V-,Mj>,  .  .  .  ,  <V„M,>  is  specified  as  I)[t)  =  (type; 

K(Mi)eW(Mi),  K(Mj)fW(Mj),  .  .  .  ,K(M,)tW(M,)]. 

A  simple  firing  associated  with  an  enabled  transition  t  is  such  that: 

(a)  For  every  scalar  input  place  s,  K^  +  ifs)  =  KJsj-l. 

(b)  For  every  scalar  output  places,  K„  +  ,(s)  =  Kn(s)  +  1. 

(c)  For  every  vector-mask  input  place  <Vj,Mi>,  then  for  VijcV;, 

j  =  l,2,...,j  Vjj  ,  Kn  +  i(vij)  =  Kn(Vij)-I  for  those  j  for  which  mijcM;  has  a  non-zero 

marking;  and  for  mijtMj,  Kn  +  i(mij)  =  KJmjj)  for  all  j. 

(d)  For  every  vector-mask  output  place  <Vj,Mj>,  then  for  Vi^fV;, 

j  =  1,2,...,|  Vi|  ,  Kn  +  i(Vij)  =  Kn(Vjj)  +  l  for  those  j  for  which  mijfMj  has  a  non¬ 

zero  marking;  and  for  mijcM;,  K„  +  |(mij)  =  K„(mjj)  for  all  j. 

As  seen  from  the  firing  rules,  SFTs  do  not  alter  their  input  or  output  masks. 

A  mask  firing  is  a.ssociatod  with  an  enabled  transition  t  that  has  at  least 
one  <Vj.Mj>  output  place,  and  is  such  that; 


aatjjojijfa 


(a)  For  every  scalar  input,  place  s,  K„  +  |(s)  =  Kn(s)-1. 

(b)  For  every  scalar  output  place  s,  K^  +  ,(s)=Kn(s)  +  l. 

(c)  For  every  vector-mask  input,  place  <V,,Mi>,  then  for  VjjfVi, 
j  =  l,2,...,|  Vi|  ,  K„  +  i(vjj)  =  Kn(''ij)~*  fo’’  l-hose  j  for  which  mijiM;  has  a  non-zero 
marking;  and  for  mijrMj,  Kn  +  ,(mij)  =  KJmjj)  for  all  j. 

(d)  For  every  vector-mask  output  place  <Vi,Mi>,  then  for  VijfVj, 

j  =  V||  .  K„  +  ,(v,j)  =  Kn(Vjp-H  for  those  j  for  which  mjjfMi  has  a  non¬ 

zero  marking;  and  for  M^,  Kn  +  i(Mi)<W(Mi),  where  W(Mi)  is  specified  by  the 
transition  descriptor  D(t),  and  Kn  +  |(Mj)  is  non-deterministically  chosen. 


’  •_*  fL* 


Ry  the  firing  definitions,  firings  remove  tokens  from  some  places  and  add 
tokens  to  other  places.  However,  the  number  of  tokens  subtracted  by  a 
transition  firing  does  not  necessarily  ecpial  the  number  that  it  adds. 

SFTs  on  firing  do  not  change  the  K(Mi)  of  their  <Vj,Mj>  input  and 
output  places;  the  markings  for  any  output  masks  of  the  transition  are  specified 
at  Kq.  This  transition  descriptor  is  noted  simply  as  D[t)  =  jSFT; _ ]. 

For  MFTs,  the  |  W(Mi)|  >  I  for  all  output  masks.  These  markings  are 
accompli.shed  by  the  transition  firing  and  after  initial  marking,  and  the  set  of 
markings  must  be  listed  in  the  transition  descriptor,  i.e.,  D[t]  =  [MFT; 
K(Mi)  r  {<  >,  <  >,...,<  >}].  In  cases  where  the  set  of  markings  range  over 
the  I  Vfj|  -fold  Cartesian  product  of  the  boolean  sot,  then; 

D[t]  =  [MFT;  K(Mi)  f  ,  where  B  is  the  set  of 

b(K)lean  numbers  {0,1}. 

An  S-Net  is  safe  iff  l\(s)  <  I  for  all  s  r  S;  K(vjj)  <  1  for  all  V”  c  V;. 


"•  A  ••  •' 


'  A'.  ■' 


m 


.  %■  v-l 


The  Average  Vector  Paralteliam  h  achieved  over  some  sequence  of  non 
primitive  transitions  t„  =  tj,tj  +  ,,...,ti,  is  defined  as 

k 


where  e,  is  the  time  units  the  transition  t,  takes  to  complete  its  operation,  and 

e„  =  E  e,. 

f=j 

The  Average  Parallelism  h  achieved  over  some  sequence  of  non-primitive 
transitions  t„  =  tj,tj  +  i,  .  .  .  ,t|,  is  defined  as 


k 


3.1.3  Stone’s  Vector  Sum  Example  [14] 

The  problem  is  to  compute 

Yn  =  E  A;,  n  =  0,1,. ..,7 
i=0 

so  that  Y(j  =  Afl 

Y,  =  Ao  +  A, 


Y7  -  Ao  +  A,  +  •  +  A; 

where  Ao,Ai,...,A7  are  scalars.  Assume  there  is  a  SIMD  machine  of  8  PEs 
(N=8).  A  high  level  language  expression  of  the  computation,  in  simplified 
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form,  is  as  follows. 

1.  Initialize  Y[I]  to  A[I],  0  <  I  <  N-1 

2.  For  k  =  1  step  1  until  log2N,  do 

3.  Begin:  Hfl]  =  Y[I-2'‘  'l,  (mask) 

4.  Y[Ii  =  Y(I1  +  H[I],  (mask) 

5.  Compute  new  mask 

6.  End 

Figure  3.1.1,  representing  the  S-Nct  model  of  this  algorilhm,  assumes  an 
initial  marking; 

Ko(S,)  =  1;  Ko(S2)  =  Ko(S3)  =  Ko(S,)  =  Ko(S,)  =  (0); 

Ko(V,)  =  Ko(V2)  =  Ko(V3)  =  <0«>; 

Ko(M,)  =  <1»>. 

Descriptors  are: 

D(t,l  =  Dig  =  Dft^]  =  Dlt^j  =  (SrT;_l: 

D(t2l  =  (MFT;K(M2)  =  <0,f  >] 

D[t3]  =  [MFT;K(M3)  f  {<0,r>,<02,l«>,<0ri'>}l 

Dftel  ^  [MFT;K(M2)^  }] 

Statement  I,  modeled  hy  transition  t,,  indicates  that  all  IMvs 
simultaneously  carry  out  the  assignments: 


Since  each  PE  has  both  Y(I)  and  A(l),  Ihese  operafions  can  lake  place  “in 
parallel.” 

Statement  3,  modeled  by  t2,  requires  that  data  are  transferred  between 

participating  PEs.  For  the  firing  of  to,  H(l)  holds  Y(0),  11(2)  holds  Y(I) .  and 

H(7)  holds  Y(6).  PEq  does  not  participate,  since  K(Mo)  =  <0,1'>. 

Statement  4,  modeled  by  tj,  has  all  participating  PEs  adding 
simultaneously.  If  MFT  t^  first  fires  KlMj)  =  <0,1‘>,  then  for  the  first 
iteration  of  the  loop,  PE  0  does  not  participate.  The  first  holding  of  <V'3,M3> 
models  sums;  Y(l)  has  Y(0)  +  Y(l);  Y(2)  has  Y(l)  +  Y(2),  etc. 

A  branch  is  modeled  by  the  1.5  and  t^  transition  which  have  scalar  input 
place  S3  in  common,  modeling  the  result  of  a  test  of  the  iteration  counter. 

The  firing  of  tj  models  the  first  iteration  (k  =  l),  the  first  firing  of  tg  models 
the  second  iteration  (k=2),  and  the  second  firing  of  tg  models  the  third 
iteration  (k=3).  After  the  third  iteration,  transition  t-j  will  fire  and  the 
computation  halts,  as  all  eight  sums  reside  in  the  appropriate  Y(  )  variables. 

Despite  the  availability  of  8  PEs,  the  degree  of  parallelism  achieved  over 
the  computational  flow  is  not  8.  Over  the  firing  sequence  described  and  listed 
in  Table  3.1.2,  g  and  g  are  calculated  as  follows: 

g  =  51/13  =  3.«2;  g  =  12/13  =  3.23. 

The  maximum  parallelism  achieved  at  this  level  of  modeling  is  reduced  by  the 
nature  of  the  algorithm  and  by  the  scalar  processes  modeled  in  the  testing  of 
the  Irjop.  The  use  of  the  h  and  h  measures  are  not  considered  here.  In  a 
realistic  analysis,  some  weighting  of  the  additional  time  to  accomplish  mask 
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firing  would  be  appropriate,  as  for  transitions  to,  13,  and  tg. 

3.1.4.  Modeling  Complex  Applications  with  S-Nots  [14]. 

S-Nots  have  been  developed  to  provide  a.  tool  to  e.xpress  algorithm 
implementation  on  SIMD  and  MSl.MI)  architectures.  The  implementations  are 
modeled  by  using  safe  and  consistent  S-Nets.  In  modeling  realistic 
applications,  the  first  obvious  complication  in  S-Net  graphs  is  that  of  length, 
i.e.,  the  length  of  the  transition  sequence  will  not  fit  within  some  desired 
reference  frame.  Connectors,  such  as  shown  here 


may  be  used  to  show  where  an  arc  breaks  and  reconnects,  much  in  the  manner 
that  connectors  are  used  in  flow  charts.  The  connector  can  be  inserted  between 
a  transition  and  the  following  place,  or  between  a  place  and  the  following 
transition. 

Macro  S-Net  transitions  (macros)  are  introduced  to  support  hierarchical 
modeling  and  to  permit  a  more  abstracted  representation  of  an  event  which  is  a 
component  of  a  computation.  A  macro  transiiion  t  is  defined  as  a  transition 
which  itself  is  an  S-Net  graph.  It  begins  with  a  single  vertex  t^  and  ends  with 
single  vertex  t,,  where  t^  and  tg  themselves  can  be  macro  S-Net  transitions.  A 
label  is  appended  to  the  transition  bar  to  distinguish  macros  in  the  global  flow, 
shown  as 


where  ID  is  nny  symbol  used  by  (he  modeler 


Figure  3.1.2  illustrates  lii(T:ir(  hieal  modeling,  defining  two  macros  and 
showing  how  the  substitution  of  the  greater  detail  can  be  modeled  in  the  S-Net. 

With  this  definition,  the  non-primitive  transition  (which  is  a  sequence  of 
transition,  tj,tj  +  ,,  .  .  .  ,t|j)  is  seen  as  a  type  of  macro  since  it  represents  a 
transition  sequence  beginning  with  tj  and  terminating  with  tj^. 

3.1.5  Modeling  Maximum  likelihood  Classification  with  S-Nets 

The  purpose  of  this  section  is  to  use  S-Nets  to  model  an  SIMD 
implementation  of  maximum  likelihood  classification  [21].  Assume  that  there 
are  N=M  PEs  available  where  the  image  size  is  M-by-M.  The  PEs  can  be 
arranged  in  a  row  of  M  elements  or  in  a  \/M-by-\Ai  array.  Each  PE  will  be 
assigned  either  one  image  column  or  a  \/M-by-\/M  subimage  that  will  be  stored 
in  its  memory. 

The  SIMD  implementation  of  maximum  likelihood  classification  has  been 
described  in  (22).  For  each  of  the  m  possible  classes  into  which  a  pixel  may  be 
classified,  a  discriminant  function  is  computed.  Each  discriminant  value 
depends  on  the  pixel  vector,  the  corresponding  class  mean  vector  and 
covariance  matrix,  and  a  constant  related  to  the  prior  probability  of  occurrence 
of  the  class.  The  pixel  is  a.ssigned  to  the  class  yielding  the  largest  discriminant 


value. 


The  essential  calculations  are  as  follows.  Let 

X  =  fxi,Xo,...,XnF.  the  n-dimensional  pixel  vector 
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m 


Uj  =  |uj,U2,...,Un|^,  the  n-dimensional 


mean  vector  for  class  i 


the  n-by-n  inverse 
of  the  covariance 
matrix  for  class  i 


•-..v.y'i 


c;  =  constant  pertaining  to  class  i 

In  addition,  A(i)  will  be  the  discriminant  value  for  class  i.  Figure  3.1.3  shows 
how  these  values  are  calculated  and  a  pixel  is  classified.  The  following 
conditions  are  assumed: 


1.  All  pixels  are  to  be  stored  in  the  PE  memories.  Each  PE  is 
responsible  for  M  pixels. 

2.  Mean  vectors  (n-dimensional)  and  inverse  covariance  matrices  (n- 
by-n)  of  m  classes  are  stored  in  all  PE  memories.  The  m  class 
constants  Cj  are  also  stored  in  all  PE  memories. 

3.  Concurrency  between  the  scalar  host  and  the  a>'ray  resources  can  be 
supported  !)>  the  hardware.  The  scalar  host  can  be  regarded  as  the 
control  unit  and  the  array  resources  as  the  processors. 


The  computation  |)roceeds  as  follows: 

1.  Initialize  the  valin's  of  \(i)  to  constant  Cj,  for  1  <  i  <  m. 

2.  (  ompiit  e  the  values  of  llie  discriminant  fund  ions  for  all  m  classes. 


»  A 
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/*  Phase  A  -  initialize  the  discriminant  values  A(i)  */ 
for  i  1  to  m 
do  A(i)  <—  C; 
end 

/*  Phase  B  -  calculate  the  discriminant  values  for  each  class  */ 
for  P  ■<—  1  to  m 
for  i  <—  1  to  n 
da  y;  +-  Xj  -  u/ 

Zj  •<—  0 

end 

for  j  <—  1  to  n 
for  k  1  to  n 

do  Zj  ^  Zj  +  e/k  *  Yk 
end 

A(P )  —  A(P )  -  Yj  *  Zj 
end 
end 

/*  Phase  C  -  find  maximum  discriminant  value  and  class  */ 

j  I 

for  i  ■>-  2  to  m 
do  if  A(i)  >  A(j)  then  j  i 
end 

Fig  3.1.3.  Maximum  Likelihood  Classification  of  a  Pixel 
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3.  Choose  the  maximum  value  of  the  discriminant  functions. 

4.  If  there  are  more  data  in  PEs,  then  go  to  step  1. 

5.  End 


The  S-.Net  modeling  tliis  compulal  ion  is  shown  in  Figure  3.1.1.  There  are 
three  macros,  I.NIT  A,  DIS  H  and  MAX  C,  defined  in  Figures  3.1.5,  3.1.6,  and 
3.1.7,  respectively.  A  brief  description  of  the  S-Net  is  as  follows: 

-  The  dimensionality  of  the  V-,  components,  |  Vj  =  M,  represents  the 
number  of  PEs  available  in  this  SIMD  implementation. 

-  The  initial  marking  of  the  S-Net  is 
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^^0(^2)  ~  ^0(^3)  ~ . ~  ~  (0); 

Ko(V,)  =  Ko(V2)  = .  =  Ko(V7)  =  <0^>; 

~  ^^0(^2)  ~ .  ~  ~  <l^*>; 


-  Descriptors  are; 

D(t,I  =  D[t2l  =  .  =  D(t26]  =  [SFT,_1; 

Dltsj]  =  [MFT;K(M7)  f  B^*]; 

D(t28l  =  f>(f2fll  = .  =  0(133]  =  [SFT,_| 


-  S,  has  a  token  in  Figure  3.f.  I  initially,  so  the  INIT  A  macro  transition 
i.s  enabled  and  begins  execution  when  its  first  transition  ti  fires. 


-  .  ^ 


Fig.  3.1.4  S-Net  Mode!  of  Maximum  Likelihood  Classification 


-  The  transition  tj  models  the  initialization  of  A(i)  to  a  constant,  Cj, 
1  <  i  <  m,  with  all  PEs  being  active  because  |  V,  |  =  M  and 
K(Mj)  =  <1^>,  and  the  execution  of  the  control-flow  or  scalar 
instructions  in  the  control  unit. 

-  The  transition  t3  models  the  event:  “check  if  i  is  greater  than  m  or 
not.” 

-  The  transition  t26  models  a  test  performed  with  all  PEs.  From  this 
test,  a  mask  marking  is  implicitly  formed. 

-  KIM;)  is  marked  on  the  graph  as  #  which  means  it  is  data  dependent. 

-  To  make  timing  analyses  more  realistic,  different  time  units  are 
assigned  to  the  transitions  depending  on  the  operations  they  represent. 
The  notation  at  each  transition  is  “t-,  =  time  units.” 

Because  of  the  data  dependent  condition,  an  execution  of  this  S-Net  over 
the  sequence  of  transitions  in  the  MAX  C  macro  will  not  reveal  the  number  of 
tokens  fired.  The  degree  of  parallelism  is  also  data  dependent.  (This  difficulty 
of  data  dependence  is  not  exclusive  to  S-Nets;  it  must  be  dealt  with  in  any 
modeling  analysis.)  In  examining  the  model  for  synchronization  between  the 
stages  of  S-Nets  and  measuring  the  degree  of  parallelism,  we  a.ssume  all  PICs 
are  active. 

The  results  of  an  execution  of  (he  S-Net  from  t,  to  (33  are  sliown  in  Pahle 
3.1.3.  From  this  table,  the  quantitative  measures  of  parallelism  are  calculated 
as  follows: 
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Etot  =  E  “i*ei  =  (4+8m  +  6mn+3mn^]M  time  units 
i=l 

33 

V  gi*ni*ei  =  [-2  +  3m+5mn+3mn^lM^ 

i=l 

33 

V  gj*ni*ei  =  [4  +  8m  +  6mn+3mn"-2M  +  3mM  +  5mnM  +  3mn^MjM 
i=l 

There  are  two  types  of  quantitative  measure  for  concurrency. 


i)  The  average  vector  parallelism  h  only  accounts  for  the  concurrency  of 
processing  elements;  it  docs  not  consider  the  overlap  of  array  and 
scalar  or  control-flow  instructions. 


33 


E 


(~2  +3m  +  5mn  +3mn^lM^ 
(4  +8m  +6mn  +3mn^]M 


Typical  values  for  a  remote  sensing  application  are  n  =  1,  m  =  16, 
M  =  1024.  Plugging  these  values  into  the  above  equation,  we  get  the 
utilization  of  the  M  Plvs  is 


Jl  =  1134 
M  1284 


0.883 


i.e.,  the  utilization  of  the  1024  PEs  is  88.3%. 
ii)  The  average  parallelism  h  accounts  for  l)oth  concurrency  of  processing 


elements  and  for  the  overlap  of  array  and  scalar  or  control-flow 
instructions. 
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33 

E  gi*ni*ei 

i=l 


_  ( 4  +  8m  +  6mn  +  3inn^)M  ^  (-2 +3m +5tnn  +3mn^)M^ 

(4+8m  +  6mn  +3inn“)M  (4+8m+6mn  +3mn^)M 

Using  the  typical  values  of  nfi,  n,  and  M  in  the  above  equation,  the 


utilization  of  the  M  PEs  is 


JL  =  1  ^  1134 

M  1024  1284 


+  =  0.884 


By  this  measure,  the  utilization  of  the  1024  PEs  is  88.4%.  Since  1024 
processing  elements  are  available,  the  overlap  of  the  control  unit  with 
the  array  processors  contributes  very  little  to  the  degree  of 
parallelism.  If  the  number  of  processing  elements  were  much  less  than 
what  we  have  here,  the  overlap  of  the  control  unit  with  the  array 
processors  would  be  more  significant. 


3.1.6.  Modeling  a  Pipeline  Implementation  of  Maximum 
Likelihood  Classification 

In  this  section,  a  pipeline  implementation  of  maximum  likelihood 
classification  will  be  modeled  with  S-Nets.  Recall  that  there  are  N=.M  PEs 
available.  'I'he  PEs  will  be  interconnected  to  form  a  set  of  parallel  pipelines 
each  operating  on  its  own  subimage.  To  enable  direct  comparisons,  the 
proposed  architectures  will  have  the  same  total  number  of  identical  processing 
elements.  The  following  conditions  are  assumed; 
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1.  The  n-dimensional  mean  vector  and  n-by-n  inverse  covariance 
matrix  of  class  i  have  been  stored  in  PEj.i,  as  have  the  m  class 
constants. 

2.  The  parallelism  being  exploited  has  each  stage  concurrently 
performing  a  different  .step  of  the  task  on  a  different  data  item 
(pixel). 

3.  Assume  —  is  an  integer  and  that  there  are  —  pipelines.  Each 

m  m 

pipeline  will  process  mM  pixels. 

4.  In  calculating  the  degree  of  parallelism,  the  overhead  required  for 
the  pipeline  to  reach  full  efTiciency  will  be  ignored,  i.e.,  steady  state 
operation  is  assumed. 

A  high  level  description  of  this  computation  at  stage  i-1  is  as  follows: 

1.  Initialize  the  value  of  A(i)  to  constant  c,. 

2.  Find  the  value  of  the  discriminant  function  for  class  i. 

3.  Compare  the  value  of  the  discriminant  function  for  class  i  with  the 
value  of  tempj  I  received  from  the  previous  stage. 

4.  Set  the  value  of  temp-class,  based  on  the  comparison  in  step  3. 

5.  Transfer  the  pixel  data,  temp;  and  temp-classj  to  the  next  stage. 

6.  If  there  is  more  data,  go  to  step  I. 

7.  End. 

The  S-Net  of  stage  i-1  of  the  pipeline  is  shown  in  Figure  :}.I.S.  A  brief 


Stage 


so 


description  for  the  S-Net  is  as  follows: 

-  Figure  3.1.8  describes  the  computations  of  one  processing  elemenf,  so 
it  has  only  scalar  places. 

-  The  initial  marking  of  the  S-Net  is 
Ko(S,)  =(1); 

^0(^2)  ~  ^0(^3)  “ .  “  ^'^0(^22)  “ 

-  Descriptors  are; 

D[t,]  =  m  = . =  »>(‘2.sl  =  ]; 

-  The  token  in  S,  indicates  that  (his  stage  has  receiv('d  the  3-ttiple  data 
transfer  from  the  previous  stage.  Once  the  data  have  been  received, 
transition  t,  is  enabled  and  fires. 

-  In  most  other  respects,  execution  |)roceeds  ns  in  (he  SIMI)  case. 

For  calculating  the  degree  of  parallelisip,  we  assume  the  S-Net  go('s 
through  the  transition  to  (04.  is  the  total  time  units  required  to  output 
one  classified  pixel.  The  timing  analysis  of  the  execution  of  the  S-Net  from  t, 
to  t24  is  shown  in  Table  3.1.1.  We  have 

^tot  ~  51)“+ I  In +8  time  units 

Since  M  FIOs  are  active  in  parallel,  (he  average  parallelism  h  is  (assume  n  =  1) 
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h  = 


24 

V  n  *v  *g\ 

1=1 _  _  (5ti~  -t-  lOii  -f  r>)M  +(n  +3)0 

^^toi  Tm*  +  1 1  n  +  S 


125  Nt 
132 


The  lit iliz;i lion  of  fho  M  l*l]s  is 


125 


0.911 


M  132 

Thus,  the  utilization  of  the  1021  l*l's  is  91.7'V. 

The  total  time  units  required  to  classify  one  image  for  the  SIMI)  and 
pipeline  architectures  are  as  follows.  For  SIMD, 

^  siMi)  -  [  *  +  +  6mn  +  3mn-j  M  =  1.315  x  10®  time  units 

For  pipeline. 

pipeline  ~  +  lln  +  SjmM  =  2.163  x  10®  time  units 

Since  Tpjpjjj„,  >  Tsjmd.  it  has  been  shown  that  the  pipeline  requires  more  time 
to  classify  an  image,  even  though  the  utilization  of  the  FKs  is  greater.  The 
reason,  of  course,  is  that  for  the  pipeline  the  overhead  resulting  from  the 
parallelism  is  greater.  The  pipeline  could  be  made  faster  if  the  speed  of  the 
individual  I’Ks  could  be  increased  through  taking  advantage  of  their  much 
more  specialized  tasks  and  interconnections. 


3.2.  SIMD  vs.  MIMI) 

In  the  previous  section,  the  description  of  .SIMD  processors  is  discussed 
using  a  graphic  representation  called  S-Nets  which  is  .an  extiuision  of  Fetri  nets. 
.Several  performance  measures  such  as  .Average  \  ector  I’arallelism  and  .Average' 
Parallelism  were  described  by  which  to  evaluate  the  perfonii.ance  of  SIMI) 


proces.sors. 
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In  this  section,  the  cofi)i)arison  hotween  execution  times  of  both  SIMD  and 
MIMI)  processors  is  disciissc'd.  When  a  particular  algorithm  such  as  the 
maximum  likelihood  «lassifi<al ion  algorithm  or  the  clustering  algorithm  is 
given,  it  is  nece.s.sary  to  :isk  which  mode  of  parallelism,  SIMI)  or  MIMD,  will 
have  less  execution  lime  and  better  utilization  of  PE  resources  if  the  number  of 
processing  elements  is  fixed.  In  order  to  ,answer  such  questions,  this  section  will 
analyze  explicit  and  implicit  operations  [23]  embedded  in  an  algorithm  and, 
based  on  these  operations,  derive  the  total  execution  times  for  both  SIMI)  and 
.MIMI)  modes  of  parallelism.  In  addition,  the  potential  advantages  and 
dis.ad vantages,  inherent  in  an  algorithm,  of  .MIMI)  mode  of  parallelism  will  be 
discussed  as  compared  to  SIMI)  mode  of  parallelism. 

3.2.1.  The  Flow  Chart  and  the  .Algorithms  of  the  Supervised  Relaxation 
Operator 

The  flow  chart  of  the  supervised  relaxation  operator  is  shown  in  Fig.  3.2.1. 
The  algorithms  in  blocks  .A,  H,  and  C  can  be  executed  simultaneously  because 
the  input  data  in  these  three  blocks  are  independent  of  each  other,  and  the 
algorithms  are  independent  of  results  produced  by  each  other.  The  algorithms 
can  be  implemented  eitha'r  in  SIMI)  nnxle  aar  in  MIMI)  mode  depending  on  the 
algorithm  characteristics,  whi<h  will  lae  described  in  the  following  sa'ctions. 
Since  the  algorithm  in  blo<k  I)  needs  the  results  from  I>locks  .A.  11  and  C  to 
execute  its  instructions,  it  is  best  that  blocks  A,  H  and  C  produce  their  results 
almost  at  the  same  time,  then  block  I)  only  needs  to  wait  as  little  time  as 
possilde  to  proceed  with  its  execution  of  instructions.  The  algorithms  in  blocks 
.\.  II.  and  ('  are  (piite  dilfereni.  Tlierefore  their  complexities  in  terms  of 
execution  time  are  also  (juite  dilfereni.  In  order  to  produce  the  ri'siills  almost 
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simuKancoijsly  and  provide  befter  utilization  of  I’ICs,  it  is  necessary  to  assign 
different  numbers  of  I’Ks  and  different  modes  of  parallelism  to  the 
implementation  of  these  algorithms. 

The  algorithm  in  block  A  is  actually  the  maximum  likelihood  classification 
algorithm  except  for  the  multiplication  by  a  priori  probability,  P(Xi),  for  each 
cla.ss.  If  we  assume  that  l’(Xi)  =  l/N  where  is  the  number  of  classes,  these 
tw(»  algorithms  are  exactly  the  .same  except  for  a  constant  factor.  The 
algorithm  in  block  A  calculates  the  initial  probability  of  every  class  for  each 
pixel.  Since  this  algorithm  involves  the  multiplication  of  vector  and  matrix, 
matrix  inversion,  calculation  of  a  matrix  determinant,  and  an  exponential 
operation,  the  execution  time  of  each  pixel  depends  on  the  multichannel  data 
values.  ICspecially  for  the  exponential  operation,  its  execution  time  is  a 
function  of  argument  v.ilue  l)ecause  it  is  a  complex  operation.  An  exponential 
operation  is  computed  as  a  power  series  and  the  convergence  of  the  power 
series  depends  on  the  argument  value.  In  SIMD  mode,  the  control  unit 
broadcasts  instructions  to  all  active  PEs  and  all  active  PEs  execute  the  same 
instruction  on  the  data  in  their  own  memories.  Indeed  all  steps  are 
synchronized  If  this  algorithm  is  implemented  in  SIMI)  mode,  each  PF) 
executes  the  instruction  on  a  different  data  value.  As  a  result,  each  PE  has  a 
different  execution  time.  Therefore,  the  PEs  with  shorter  execution  times  have 
to  wait  until  the  PE  with  longest  execution  time  completes  its  execution.  Then 
the  control  unit  can  broadcast  the  next  instruction  to  all  PEs.  The 
disadvantage  of  using  SIMI)  mode  is  that  the  total  execution  time  is  equal  to 
the  summation  of  the  maximum  execution  times  of  every  .)peration  because  the 
execution  time  of  the  operation  is  data  dependent.  The  advantages  of  using 
SIMI)  mod  e  is  that  scalar  instructions  executed  by  the  control  unit  and  array 


instructions  executed  by  the  I’Ks  can  be  overlapped. 

On  the  other  hand,  if  the  algorithm  in  block  A  is  implemented  in  MIMI) 
mode,  each  f'E  can  execute  an  instruction  fetched  from  its  own  memory 
immediately  after  completing  the  previous  instruction.  Therefori*  the  total 
execution  time  is  thi*  summation  of  the  execution  times  of  evi>ry  operation  in 
the  PE.  If  we  assume  that  the  data  values  are  randomly  distributed 
throughout  the  PEs,  the  probability  of  getting  the  data  with  longer  execution 
time  is  the  same  for  every  PE.  With  this  property,  the  total  execution  times  of 
PEs  in  MIMD  mode  almost  have  the  same  values  if  the  number  of  pixels  in 
each  PE  is  large  enough. 

The  algoritlim  in  block  M  involves  only  tlu'  accumulative  operations  of 
counting  both  individual  and  joint  occurrences,  and  division  operations.  The 
execution  times  of  these  two  operations  are  almost  data  independent. 
Therefore,  no  benefit  can  be  drawn  from  implementing  in  MIMI)  mode. 
Furthermore,  the  algorithm  has  window-type  operations  and  needs  inter-PE 
transfers.  SIMD  mode  can  offer  synchronized  inter-PE  transfers.  That  is,  all 
active  PEs  can  transfer  data  to  the  neighboring  PIvs  located  in  the  same 
direction  relative  to  the  transferring  PEs.  Since  operations  are  simple  and 
execution  times  are  short,  fewer  PEs  are  a.ssigned  to  block  B  when  compared  to 
block  A  in  order  for  block  A  and  B  to  take  same  amount  of  time  to  execute. 
SIMD  mode  is  implemented  to  t.ake  advantage  of  synchronized  inter-PIC 
transfers  and  simple  control  system. 

The  algc'rithm  in  block  (’  consists  of  two  subalgorithms.  One  determines 
the  aspect  angle  of  the  pixel.  From  this  aspect  angle,  the  other  subalgorithm 
calculates  the  .species  distribution  function  of  the  classes  which  are  po.ssibly  to 
be  assigned  to  the  pixel.  The  subalgorithm  to  determine  the  aspect  angle  has 
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window  type  operations  and  needs  inter-PE  transfers.  The  subaigorithm  to 
calculate  the  distribiiting  function  is  a  neighbor-independent  algorithm  and 
involves  complicated  mathematical  operations.  If  we  assume  that  the 
distribution  of  each  class  is  (laussian  and  that  the  mean  and  variance  of  every 
class  are  already  known  and  stored  in  the  PIC's  memory,  the  relative  likelihood 
of  finding  each  class  for  the  current  pixel  can  be  calculated  plugging  the 
elevation  value  into  the  (jaussian  distribution  function.  As  a  result,  more  PEs 
are  needed  than  in  block  11  to  speed  up  the  process  in  order  to  synchronize  the 
results  produced  from  blocks  A,  B,  and  C  and  feed  them  to  block  D. 

The  algorithm  in  block  D  is  the  supervised  rela.xation  operator.  It  involves 
operations  such  as  addition,  multiplication  and  division  VVe  assume  that  these 
operations  only  span  a  narrow  range  of  time  if  the  argument  value  varies. 
Therefore  this  algorithm  does  not  need  to  be  implemented  in  MIMD  mode. 
Instead  it  is  implemented  in  SIMD  mode.  Every  instruction  is  synchronized 
and  the  control  system  is  simpler  than  that  in  MIMD  mode. 

.1.2. *2  Detailed  Description  of  the  .Algorithm  in  Block  (' 

The  algorithm  calculates  the  aspect  angle  [1]  by  numerically  differentiating 
the  topographic  data  to  produce  an  e.stimate  of  the  gradient  vector  at  each 
pixel  location.  Then,  the  direction  of  the  vector  is  used  as  aspect  angle.  The 
approximate  gradient  at  line  i  and  column  j  is  computed  as  follows  [I]: 

V2  ~  l(Zj  ,  j  -  ^(^1.;  I  ~  +  (.1.2.1) 

where 

S/7j  is  the  gradient  vector 

Zij  is  the  topographic  elevation  value  at  i,j 
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i,j  are  line  and  column  coordinates 
I  and  7  are  line  and  column  unit  vectors 
The  aspect  angle  is  calculated  as  follows  [I]; 


Q  -  tan 


i.j 

(^ij  1  ~  + 


(3.2.2) 


where 

a  is  the  direction  of  slope  measured  clockwise  from  north. 

The  subalgorithms  for  both  aspect  angle  and  species  distribution  are 
shown  in  Fig.  3.2.2.  The  subalgorithm  for  calculating  the  aspect  angle  of  the 
current  pixel  decides  that  a  pixel  faces  either  south  or  north.  Forest  species 
distribution  as  a  function  of  elevation  and  aspect  in  the  San  Juan  Mountains  is 
shown  in  Figure  2.1.4  [1].  In  this  figure,  different  aspects  have  different  forest 
species  distributions.  That  is  the  reason  why  the  aspect  angle  is  needc'd  before 
calculating  the  distribution  value.  Con.stant  C  in  Fig.  3.2.2  is  a  threshold  to 
distinguish  angle  of  south  from  that  of  north.  In  Fig.  3.2.2,  the  same 
subalgorithm  is  used  to  calculate  the  species  distribution  of  south  facing  and 
north  facing  pixels.  But  the  actual  parameters  such  as  m[k]  and  <T[k]  which  are 
the  mean  and  the  standard  deviation  of  class  k  are  different  for  south  facing 
and  north  facing  pixels.  The  constant  f'lk]  in  the  subalgorithm  is  a 
precalculated  value  stored  in  the  memory. 


3.2.3  Implicit  Operations  in  the  Algorithm 

Some  of  the  statements  in  the  following  two  sections  are  quoted  from  [23]. 
The  conventional  complexity  analysis  of  an  algorithm  gives  no  knowledge 
about  how  fast  an  algorithm  will  execute  for  a  given  fa>k  of  si/c  N.  Therefore 
this  kind  of  analysis  gives  only  the  order  of  magnitude  of  the  time  that  an 
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/♦Algorithm  for  calculating  aspect  angle.*/ 

for  (i=0;  i<n;  i+  +)  /*subimage  size  is  nxn  */ 

for  (j=0;  j<n;  j  +  +){ 

V=(z(i-lj[jj-zli  +  ll[jl)/(z[i][j-l)-z(i)[j  +  l]); 
o  =  tan  ‘V;  /*  ft  is  the  aspect  angle  */ 

if  (ft  >  C)  /*  pixel  is  south  facing  */ 

calculate  <p  /*Thi.s  is  a  subroutine  which  calculates  species  distribution*/ 
else  /*  pixel  is  north  facing  */ 

calculate 

} 

/*  Subroutine  to  calculate  species  distribution  function  of  each  class  for  pixel 
of  either  south  facing  or  north  facing.  */ 

{2  tnj, 


/*  ^(k)=cv 


where  = 


for  (k  =0;  k<N;  k  +  + )  {  /*  N  is  the  number  of  classes  */ 

temp  =-(z(il[jl-m[k])*(z(i](jl-mlk))/(2*(T(k)  *(T[kl); 

0(k!=C[kl  *exp(temp);  /*  <^(k)  is  the  distribution  value  for  class  k.  */ 

} 


Fig.  3.2.2.  The  Algorithm  for  Aspect  Angle  and  Species  Distribution 


algorithm  will  take  for  a  given  size  N,  such  as  “order  N"  or  “order  N”."  Several 
papers  have  given  the  analysis  of  execution  time  by  counting  the  number  of 
explicit  operations  that  an  algorithm  will  perform  (24,2a|.  However,  there  are 
many  implicit  operations  neglected  in  the  analysis  such  as  the  calculation  of 
the  real  address  of  index  variable  z(i)(j),  moving  operands  from  memory  into 
machine  registers,  and  moving  results  from  machine  registers  back  into  memory 
[1].  Counting  only  explicit  operations  gives  only  an  analysis  of  explicit 
arithmetic  operations,  and  not  the  implicit  operations.  Often  these  implicit 
operations  can  have  a  significant  impact  on  the  execution  time  of  an  algorithm 
and  therefore  should  not  be  ignored. 

In  Fig.  3.2.2,  there  are  many  indexed  variables  such  as  z[i][j],  m(k]  and 
(T[k].  Let  us  look  one  example  to  identify  implicit  operations.  I'or  t'xample.  in 
order  to  get  the  value  of  m[k|  into  a  machine  register,  the  base  addrc'ss  of  array 
m  must  be  found,  put  into  a  machine  register,  added  to  i,  and  the  resulting 
address  used  to  load  the  value  of  m[k].  As  another  example,  loading  the  value 
of  z[i][j]  into  a  machine  register  rerpiires  that  i  be  multiplied  by  the  row  size  of 
the  z  array,  added  to  j,  the  result  added  to  base  address  of  the  z  array,  and  the 
resulting  address  used  to  load  tin*  value  of  zli]lj].  .Alternatively,  to  avoid  the 
multiplication,  an  indirection  table  could  be  usmI.  d'lie  ini'thod  reipiiri's  that  i 
be  added  to  the  base  address  of  the  indiri'ction  table,  the  resulting  address  used 
to  load  the  address  of  the  ith  row  of  array  z.  which  is  added  to  j  to  get  the 


address  of  z[i](j].  Finally,  this  address  is  used  to  load  the  data  itself.  While 
this  method  saves  a  multiplication,  it  requires  an  additional  memory  load  and 
space  for  the  indirection  table. 

The  notation  Mjaddress]  will  be  used  to  denote  a  memory  reference  to 
address.  The  implicit  operations  required  to  access  operands  are  summarized  in 
Table  3.2.1. 

Before  counting,  in  Section  3.2. 1,  the  total  number  of  implicit  and  explicit 
operations  in  the  program  shown  in  h'ig.  3.2.2,  several  assumptions  [23]  are 
listed  as  follows: 

1.  The  class  means  and  standard  deviations,  m[k]  and  (rjk),  k  =  1,2,...,N, 
have  been  stored  in  each  I'K. 

2.  C[kj  =  I/(v/^cr[k]),  k  =  1,...,N,  have  been  precalculated  and  stored  in 
each  BF. 

3.  There  are  not  enough  machine  registers  to  hold  all  data  of  the 
program  shown  in  Fig.  3.3.2. 

4.  There  are  enough  machine  data  registers  to  hold  all  index  variables 
currently  in  use  such  as  i,  j,  and  k  in  the  program. 

5.  There  are  enough  machine  data  registers  to  hold  all  temporary  results 
such  as  V  and  temp,  and  repeated  expressions  such  as  (T[k]  and  a 
compiler  is  capable  of  recognizing  and  exploiting  this  fact.  That  is. 
the  compiler  will  not  generate  store  and  re-load  operations  in  these 
situations. 

t).  There  are  enough  address  registers  to  hold  the  addresses  of  all  single 
\ariables  suc  h  as  \  and  i\.  .and  the  base  addressi-s  of  array  variables 


such  as  z,  m.  and  (t. 


Table  3.2.1.  Implicit  Operations  to  Access  Operands  [23] 


To  Access 

Notation 

Operations 

c 

M(c| 

1  memory  reference 

m[k] 

M[ba.se  of  ni  +  k] 

I  addition 

1  memory  reference 

M[M[base  of  z  +  i]+j] 

2  additions 

2  memory  references 

7.  Variables  already  iri  machine  registers  have  no  implicit  operations 
associated  with  them. 

8.  Implicit  operations  include  the  movement  of  constant  data  and  new 
addresses  from  the  instruction  stream  into  data  and  address  registers. 
When  variables  are  first  used,  their  addresses  are  migrated  from  the 
instruction  slrc-am  to  address  registers,  .\ddresses  and  data  in 
registers  are  discarde<l  when  no  longer  needed. 

I  sually,  different  algorithms  spend  a  different  portion  of  run  time 
executing  implicit  operations.  Therefore  the  relative  times  needed  to  perform 
various  types  of  operations  are  important.  For  many  processors,  multiplication 
and  (lo'ision  op<'ratioiis  take*  from  10  to  aO  times  as  long  as  addition  or 
subtraction  operations.  d'lius  if  a  prt)gram  contains  many  exidicit 
multiplication  and  division  operations,  the  effect  of  the  im])l!cil  operations  may 
not  be  significant.  On  the  other  hand,  many  algorithms  contain  no 
multiplication  and  divi.sion  operations  at  all.  Ihuler  these  circumstances, 
explicit  operations  alone  give  a  very  poor  estimate  of  the  total  algorithm 
execution  time.  Realizing  the  importance  of  the  relative  times  of  various 
operations,  a  list  of  times,  in  cycles,  for  various  operations  is  givon  in  Table 
•'5.2.2  [2.'5].  The  limes  given  are  for  the  Motorola  MCtiSOOO,  a  typical  modern 
multiregister  processor  The  internal  cycle  time  for  an  SMIIz  MC  (>8000  is  2v>0 
ns.  All  of  the  figures  given  are  m  "internal  cycles  ' 


Table  3.2.2.  Cycle  Times  of  Various  Operators  [23] 


OPElfANl) _ (  VCM:s _ SIZE 


(operands  in  registers) 
addition,  subtraction,  (A,S) 

2 

word 

3 

long 

boolean,  comparison 

2 

word 

3 

long 

shift 

3  +  shiftcount 

word 

multiplication  (M) 

35 

w'ord*W()rd  =long 

division  (D) 

75 

long/word  =word 

(address  in  a  register) 
load,  store  (R ) 

1 

word 

6 

long 

(immediate) 

load  immediate  address  or  constant  data 

1 

word 

6 

long 

(operand  in  a  register) 
test  and  branch  on  condition 

5 

(address  in  a  register) 
subroutine  call 

9 

save/restore  n  registers  on  the  stack 

'1  +  '1n 

subroutine  return 

9 

interrupt 


21 


$ 


y 

s 

I  3.2.4.  Comparison  Between  SIMI)  and  MIMl) 

^  Some  assumptions  of  the  processing  environments  of  SIMD  and  MIMD  [23] 

are  repeated  here.  For  SIMD  mode: 

I  1.  The  control  unit  broadcasts  the  instruction  stream  to  all  active  PEs 

and  all  active  PlCs  execute  the  same  instruction  simultaneously  on  the 
data  in  their  own  memory.  While  active  PlCs  execute  the  instructions 
I  in  parallel,  the  control  unit  can  do  scalar  instructions  such  as  loop 

counting,  branching,  and  subroutine  calls  and  returns.  These  control 
unit  operations  are  overlapped  with  PE  operations  and  thus  do  not 
contribute  to  the  overall  algorithm  execution  time.  Since  mask 
operations  are  decoded  in  the  control  unit.  PIC  addre.ss  and  general 
masking  operations  cost  nothing  in  the  total  algoiithm  run  time. 

2.  Data  transfers  through  the  interconnection  network  are  performed 
simultaneously.  The  network  transmission  delay  for  a  circuit- 
switched  multistage  network  is  Ie.s.s  than  2  cycles.  The  total  cost  of 
an  intor-PlC  transfer  is  4  +  2  +  I(  =  10)  cycles  for  the  load-tran.sfer- 
store  sequence. 

3.  ‘if  any,”  "if  all”  or  "where”  conditional  steps  require  about  25  cycles. 
Control  unit  and  PE  operations  cannot  be  overlapped  for  these 

statements. 

For  MIMD  mode: 

I.  PICs  fetch  instructions  from  their  own  memory  and  do  all 
com|)utat ional  and  control  (branching)  operations. 


The  data  transfers  through  the  interconnection  network  operate 
asynchronously.  Each  transfer  causes  an  interrupt  at  the  destination 


PE.  The  time  to  service  the  interrupt  includes  the  time  to  save 
registers,  call  the  interrupt  service  routine,  load  the  incoming  data 
from  the  I/O  port,  store  the  data  in  an  appropriate  hulTer,  and  return 
to  the  interrupted  routine. 

The  algorithm  to  calculate  the  aspect  angle  of  the  pixel  includes  window 
type  operations  and  therefore  requires  inter-PR  data  transfers.  Th^-  image  to 
be  processed  is  superimposed  on  the  PEs,  which  are  arranged  in  a  mesh-type 
array.  The  image  is  divided  among  the  PEs  and  each  PIC  is  responsible  for  the 
corresponding  subimage.  The  window-type  operation  needs  data  from  one  or 
two  neighboring  PICs  when  it  processes  any  border  pixel  of  its  own  subimage. 
In  SIMD  mode,  these  data  transfers  are  performed  simultaneously  for  all  PICs 
via  the  interconnection  network.  Specifically,  for  an  nxn  subimage,  a  total  of 
4n  parallel  transfers  are  needed  for  the  algorithm  to  calculate  an  aspect  angle. 
From  assumption  2  for  SIMD  mode,  each  parallel  transfer  costs  10  cycles  to 
complete  the  operation.  On  the  other  hand,  in  MIMD  mode,  each  data  transfer 
causes  an  interrupt  at  the  destination  PE  and  this  operation  is  carried  out 
asynchronously  by  the  PE.  Assuming  the  same  size  subimage,  each  PE  causes 
4n  interrupts  at  the  destination  PE,  and  gets  interrupted  4n  times  by  other 
PEs  when  data  are  ready  to  be  transferred.  Therefore,  each  PE  totally  has  8n 
interrupts  for  MIMD  mode.  Deadlock  is  the  situation  which  occurs  when  two 
PEs  interrupt  each  other  and  each  waits  for  the  other  PIO  to  continue. 
Deadlock  can  be  prevented  if  the  procedures  below  are  followed  [2B]: 

1.  If  a  PE  gets  interrupted,  then  no  other  PEs  can  interrupt  this  PE. 

2.  If  the  same  PE  gets  interrupted  simultaneously  by  more  than  one  PE, 
one  of  them  is  given  a  higher  priority. 
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3.  If  PlOi  and  IMCj  inti'rriipt  rach  other,  then  eaeli  IMO  has  the  eapal)ility 
to  grant  one  of  tlu*  interrupts  and  disable  the  other.  For  example, 
F^Ei  has  the  eapal)ility  to  detect  that  it  interrii[)ted  and  that  FICj 
interrupted  PICi.  Therefore,  if  PEi  has  higher  priority,  PEj  will  grant 
PEi’s  interrupt  request  and  PEj  w-ill  withdraw  its  interrupt  to  PEi.  A 
similar  situation  occurs  if  Pl'i  interrupts  Pl]j  and  IMCi  gets  interrupted 
simultaneously  by  Pl!k. 

Since  the  loop  counting  o|)erations  in  SlMl)  mode  are  ('xeculed  by  the  Cl' 
and  overlapped  with  array  instructions  executed  by  PEs,  they  cost  nothing  in 
the  total  algorithm  run  time.  On  the  other  hand,  loop  counting  operations  in 
MIMD  mode  are  executed  by  the  PEs  and  not  overlapped  with  array 
instructions.  For  example,  the  “C"  program  instruction  in  Fig.  3.2.2  » 

for  (i  =  0;  i  <  n;  i  +  + ) 

requires  n  additions  and  n  +  1  comparisons.  These  operations  are  in  the 
category  of  loop  counting  operations. 

There  is  no  memory  contention  problem  in  SIMD  mode  since  the  elevation 
values  z(i](jl  of  the  nxn  subimage  and  all  the  parameters  such  as  m(k)  and  (r[k] 
are  stored  in  the  corresponding  memory  of  the  PEs.  PEs  only  fetch  data  from 
their  own  memories.  Ihiwever,  there  are  two  alternative  memory  organizations 
possible  in  MIMl)  mode  One  is  the  MIMD  mode  with  global  memory  which 
stores  the  parameters  such  as  m[k]  and  <7[k].  but  the  o'cvation  values  for  the 
subimage  in  each  I’E  are  stored  in  local  PE  memory.  Serious  memory 
contention  will  occur  when  every  PE  tries  to  access  m[k)  and  <T[k]  to  calculate 
the  frequency  responses  of  all  classes.  The  other  MIMD  mode  uses  only  local 
memory.  In  this  case,  all  parameters  and  data  are  stored  in  local  memory.  Of 
course,  it  requires  more  total  memory.  .All  PIvs  can  access  the  data  from  their 
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own  moniorios  without  memory  contention.  Her**,  we  consider  th<‘  MIMI)  modi' 
with  local  memory  only. 

The  algorithm  shown  in  I'ig.  3.2.2  requires  (2  +  N)  suht raclions,  (I  +  \) 
divisions.  4N  multiplications,  1  comparison.  1  conditional  step,  1  arctangent, 
and  N  exponential  operations  per  pixel  where  N  is  the  number  of  cl.asses. 
These  operations  are  in  the  category  of  explicit  operations. 

Now  we  are  in  a  position  to  enurin'rate  the  implicit  operations  associated 
with  the  program  shown  in  Fig.  3.2.2.  It  is  found  that  the  algorithm  requires  1 
operation  to  move  the  constant  2  to  a  data  register,  9  oper.ations  to  move  the 
addresses  of  variables  such  .as  V,  z,  a...  etc.  to  address  registers,  (IN +  11) 
additions,  and  ( IN  + 10)  memory  references  per  pixel.  A  memory  reference  is 
either  a  memory  store  or  a  memory  load  operation. 

The  number  of  operations  involved  in  executing  the  algorithm  in  SIMI), 
MIMD,  and  serial  modes  are  listed  Table  3.2.3.  The  operations  to  move  data 
and  addresses  into  data  and  address  registers  arc  included  in  immediate 
operations.  In  SIMD  mode,  data  transfers  are  via  interconnection  networks  and 
not  via  interrupt  operations.  Loop  counting  operations  are  overlapped  with  PFv 
instructions.  The  same  situation  doesn’t  occur  in  either  MIMD  or  serial 
processor  n'ocies  In  MIMI)  mode,  data  transfers  are  executi'd  on  demand  via 
interrupt  operations  but  not  through  |»arallel  transfers.  In  serial  processor 
mode,  the  overhead  of  parallelism,  pre.sent  in  both  .SIMI)  and  MI.MI)  modes, 
doesn't  exist. 

In  Table  3.2.3,  the  execution  times  of  most  opc'rations  are  argument- 
independent  except  two:  the  arctangi'iit  and  exponenti.il  operations.  In  SIMI) 
mode,  since  every  instruction  is  a  lockstep  operation,  the  I’Fs  with  shorter 
instruction  execution  time  has  to  wait  for  compli'tion  of  the  I’b^  with  longest 
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Table  3.2.3.  Operations  in  SIMI),  MIMD,  and  Serial  Processor 


SIMI) 

MlMl) 

si:ki.\l 

Interrupt 

0 

8n 

0 

Parallel  Transfer 

4n 

0 

0 

Loop  Counting; 
addition 
comparison 

0 

0 

EiBEBEiflSjjS 

Implicit  Operation; 
addition 

memory  reference 
immediate 

n-(lN  +  l  1) 
n'(  tN  + 10) 
10 

n-(4N  +  ll) 
n2(4N  +  10) 

10 

M-(4N  +  14) 
M2(4N  +  10) 

10 

Explicit  Operation; 
subtraction 
division 
multiplication 
comparison 
condition 
arctangent 
e.xponential 

n’(2  +N) 
n“(l+N) 
n2(4N) 

O 

n“ 

o 

n* 

•> 

n“ 

n-N 

n-(2  +  N) 

n2(H-N) 

n2(4N) 

n** 

«> 

ir 

o 

ir 

n-N 

M2(2  +  N) 
M2(1+N) 
M2(4N) 

m2 

m2 

m2 

.M2N 

M  X  M  —  image  size 
n  X  n  =  subimage  size 


N  =  number  of  classes 
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instruction  execution  time.  Therefore,  the  total  algorithm  run  time  is  th(‘ 
summation  of  the  maximum  execution  titiies  of  ev<'ry  operation.  'I'he  lockstep 
operation  mode  will  thereby  lead  to  inellicient  utili/ation  of  I’l^s  in  SIMI).  On 
the  other  hand,  F’lOs  in  MIMI)  mode  can  execute  the  next  instruction 
immediately  after  completing  the  current  instruction.  Therefore,  parallelism  in 
MIMD  mode  has  the  potential  to  save  execution  time.  The  cycle  times  of 
various  operations  are  listed  in  Table  3.2.2.  But  there  is  no  data  available  from 
which  to  estimate  the  range  of  the  execution  times  of  eith(*r  arctangent  or 
exponential  operations.  For  most  procesviors,  multiplication  and  division 
operations  take  from  10  to  50  times  as  long  as  addition  or  subtraction 
operations.  [Exponential  and  arctangent  operations  are  comi)lex  operations 
which  involve  many  multiplication  and  division  operations.  Therefore,  a 
reasonable  range  of  execution  times  of  these  two  operations  are  assumed  in 
order  to  carry  out  the  comparison  of  the  execution  times  of  SIMI)  and  MIMI) 
modes.  Further,  it  will  be  assumed  that  the  argument  values  are  randomly 
distributed  throughout  the  whole  image  so  that  the  algorithm  run  time  is 
approximately  equal  to  the  sum  of  the  means  of  every  operation  execution  time 
in  each  PE. 

The  plane  of  execution  times  for  SIMD  mode  is  shown  in  Fig.  3.2.3  under 
the  assumption  that  the  range  of  maximum  execution  times  of  both  operations 
is  from  200  to  400  cycles.  Other  assumptions  used  to  calculate  the  algorithm 
execution  time  are  that  an  interrupt  costs  21  cycles,  a  parallel  transfer  10 
cycles,  an  addition  2  cycles,  a  comparison  2  cycles,  a  memory  reference  I  cycles, 
an  immediate  operand  4  cycles,  a  subtraction  2  cycles,  a  division  75  cycles,  a 
multiplication  35  cycles,  a  conditional  step  25  cycles;  the  subimage  size  is 
n  =32,  the  number  of  cla.sses  N  =5,  and  the  whole  image  size  M  =256.  The  z- 


no 


axis  is  pxociifion  tinu*,  tho  x-axis  is  Iho  range  of  the  execution  time  of  the 

arctangent  operator,  and  the  y-axis  is  the  range  of  the  execution  time  of  the 

exponential  operator.  'I'he  range  of  the  mean  instruction  execution  time  is 

assumed  to  be  from  ISO  to  dGO  cycles  for  both  operat()rs.  The  plane  of  the 

execution  time  for  MIMl)  mode  is  also  shown  in  Fig.  3.2.3.  If  the  range  of  the 

execution  time  is  dilferent  from  what  we  have  assumed  here,  the  results  are 

% 

('xpecled  to  be  similar.  'I'lu'  results  shown  in  Fig.  3.2.3  only  indicate  the  trend 
of  the  dilTeri'iice  of  the  execution  limes  between  SlMD  and  .MIMD  modes.  This 
figure  shows  that  MIMl)  mode  has  less  execution  time  than  SIMI)  mode. 
Although  scalar  operations  can  not  overlap  with  array  operations  in  MIMD 
mode,  the  execution  time  saved  due  to  efficient  utilization  of  PEs  in  MIMD 
mode  leads  to  less  execution  time  and  better  utilization  of  PEs  in  MIMD  mode. 

3.3  Alternative  Performance  Criteria 

Feature  enhancement  methods,  commonly  used  in  producing  maps  from 
imagery  data,  enhance  or  emphasize  information-bearing  attributes  of  the  data 
while,  ideally,  suppressing  other  “noise*’  characteristics.  The  distinction 
between  “information”  and  “noise”  is  highly  application-dependent.  For  the 
purpose  of  this  .. udy,  attention  is  focu.sed  on  spectral  similarity,  an  image 
attribute  which  is  commonly  found  to  be  useful  in  scene  analysis.  If  the 
imagery  is  black-and-white,  spectral  similarity  reduces  simply  to  tonal  (gray¬ 
scale)  similarity.  A  clustering  algorithm  will  be  described,  including  a  parallel 
implementation,  which  can  be  used  to  identify  sets  of  spectrally  similar  pixels. 
'Fhe  pixels  need  not  be  spatially  contiguous,  but  simply  have  the  same  “color” 
in  a  generalized  sense  This  algorithm  will  be  used  to  illustrate  alternative 
performance  criteria  for  evaluating  parallel  (SlMD)  algorithms. 


In  general,  the  complexity  of  SIMl)  algorithms  is  a  function  of  the  problem 
size  (number  of  elements  in  the  data  set  to  be  processed),  machine  size  (number 
of  l’b.;s).  and  the  intercoiiiieclion  network  used  to  provide  communications 
among  the  I'lCs.  I'or  example,  an  algorithm  which  uses  N  PKs  to  execute  some 
operation  on  an  M  x  M  image  will  exhibit  different  “performance”  for  different 
values  of  N.  The  obvious  use  of  performance  measures  is  for  selecting  from 
alternative  .SIMI)  algt)rifhm.s.  For  a  given  SIMD  machine,  different  algorithms 
for  performing  a  particular  task  can  be  compared.  The  algorithm  which 
performs  best  based  on  the  desired  measurement  criteria  can  then  be  chosen. 
As  another  use  for  performance  measures,  consider  the  situation  where  the 
typical  values  of  image  size  .\1  are  known.  Then  a  measure  of  the  way  in  which 
the  machine  size  affects  the  performance  of  the  application  algorithms  will  be 
useful  in  deciding  how  many  PR’s  the  system  should  have.  Lastly,  given  a 
reconligurable  system  [-7j.  the  machine  size  can  be  tailored  to  the  problem  size 
for  execution  of  a  given  algorithm  if  there  exists  performance  criteria  for 
comi)aring  different  clu)ices  of  machine  size.  The  goal  of  this  section  is  to  study 
the  relationships  among  the  various  parallel  configurations.  In  order  to 
demonstrate  one  way  in  which  the  measures  can  be  applied,  an  SIMD  clustering 
algorithm  is  presented.  In  this  example  algorithm  evaluation,  both  the  image 
size  and  the  machine  size  :ire  v.iried.  permitting  the  performance  of  the 


algorithm  to  hr  examined  and  compared  under  a  varii'ty  of  conditions. 


3.3.1  A  Clustering  Algorithm 


Given  n-dimensional  vectors  =  [ajSo,  •  .  .  .  a,,]^  and 

Xj,  =  (bjb^,  .  .  .  .  bj^,  the  Euclidean  distance  between  the  vectors  is  defined  by 

l'/2 

D  =  V  (aj-bip 
i  =  I 

Given  a  population  of  n-dimensional  vectors  normally  distributed  N(l'j,V  ) 
and  a  second  population  normally  distributed  N(lJj,V.).  where  Ej,  Ej  are  the 
population  means  and  V.,  are  the  population  covariance  matrices,  the 
distance  between  these  populations  is  defined  to  be  the  divergence 

'>ii  =  ^  '-i:i ')] 

The  following  clustering  algorithm  is  based  on  the  ISODAT.A  algorithm  of 
Hall  and  Hall  [28].  It  groups  vectors  in  such  a  way  as  to  minimize  the  sum-of- 
squared-error  (S.SIC): 


SSE  rr.  X-\1J- 

- S  _ _ J  1} 


where 

c  =  number  oi  clusters 

Cj  =  the  set  of  vectors  belonging  to  the  ith  clusler 
M,  =  the  mean  vector  for  the  ith  clii'.ter 


Intuitively  the  vectors  are  grouped  as  tightly  as  possible  about  llieir  respect i\e 
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Step  1:  Select  c  arbitrary  but  distinct  vectors  to  serve  as  initial  cluster 
centers,  i  =  l,2,...,c.  (The  user  specifies  the  value  of  c.) 

Step  2:  Assign  each  vector  in  the  data  set  to  the  nearest  cluster  center,  based 
on  lOuclidean  distance. 

Step  3:  Conipute  the  mean  vectors  for  the  data  assigned  to  each  cluster. 
Denote  means  by  Mj,  i  =  1.2,...,c. 

Step  4:  If  the  new  cluster  means  Mj  are  identical  with  the  old  cluster  centers 
Mj,  go  to  Step  r>.  Otherwi.se  set  M;  =  Mj.  i  =  l,2,...,c  and  return  to 
Step  2. 

Step  ■'):  Compufe  the  inti'rclust er  distances  (divergences)  and  m'  ;ge  indistinct 
clusters  (having  divergences  less  than  a  prespecified  threshold). 
C'lustering  complete. 

'I'he  clustering  algoritlitn  is  depicted  in  Figure  3.3.1.  On  every  iteration  of 
steps  2,  3,  and  4,  the  reassignment  and  movement  of  cluster  centers  (means) 
reduces  the  SSli  Since  there  is  a  lower  limit  on  the  SSI^  (it  cannot  be  made 
less  than  zero),  the  algorithm  is  guaranteed  to  terminate  via  step  5. 

3  3.2  Parallel  Implement  ;it ion 

I’lgure  3,3.2  contains  a  jjarallel  implementation  of  the  clustering  algoritlitn 
m  a  high-level  programming  language.  The  im|)lementat ion  is  of  the  SIMD 
type.  .N  is  the  number  of  Pi's  and  MxM  is  the  image  size.  The  PIOs  are 
configured  logically  as  a  \/N-by-\/N  grid  on  which  the  M-by-M  image  is 

M  M 

superimposi'd,  so  that  each  PI!  holds  a  --^-by-— ^  subimage  (for 
convetuence.  it  is  assumed  that  .M/v-^'  is  an  integer).  'I'he  '  local"  assignment 


Initialize 
cluster  centers 


AssiEjn  each  vector 
to  n ('a rest 
cluster  ccnte' 


( 'alculate  means  of 
new  'clusters" 


New  means  .. 

X.  •  k 

.Set  cluster 

identical  with  . » 

centers  equal 

previous 

to  new  means 

^  es 

(  ompute 

separal)ility 

in  format  ion 

Fig.  d.d.  1.  Basic  Clustering  .Algorithm 

/*  Iterative  ( 'lusteriiiK  AlRorithin  */ 


h 


\!l  e\eeiit('  in  parallel. 

mmiher(i),  i  =  1 . e  stores  ilie  iiiimher  of  pixels  in  tlie  corri'sixniciing  class. 

.Assume  ail  initial  c  cluster  centers  are  already  stored  in  "aeh  PK. 

Let  M;  =  [ii>|i,niio,....iii,„j^.  i  =  L‘^ . <'  Le  c  initial  cluster  centers. 

l.et  M|  =  [niiiaiijo i~1.2 . e  he  the  new  clusier  means. 

n  is  the  number  of  featnr<'s  per  [)ix<‘l.  N  is  the  number  of  I^L2s  and 
the  imaRe  size  is  MxM 

.Assunu'  this  is  a  checkerl)oard  typ('  data  allocation  scheme. 

Pl]s  are  arranged  in  a  vAX-by-x/N  array  and  (sach  Plv 


stores  a 


M  ,  M 

'Vs 


stihimage. 


*/ 


/»  Initialization.  Zero  the  means  and  pixel  counts.  */ 


for  i  1  to  c 

do  for  j  1  to  n 
do  nijj  ■<—  0 
end 

number  (i)  •<—  0 
end 


/*  Nearest  neighbor  assignment  */ 

/*  l.et  X  be  the  spectral  measurement  vector  of  a  pixel  where  x  =  [X|....,Xn]^  an 
n  is  the  no.  of  features  (ler  pixel. 

Array  pclass  stores  the  cluster  no.  to  which  pixels  belong, 
nijj  mean  for  class  i,  feature  j  at  end  of  iteration 
pchange  stores  the  no.  of  pix<'ls  which  change  class 
assignment  from  last  iteration  to  this  iteration  */ 


Fig.  Parallel  lmj)lcmentatio»  of  the  Clustering  Algorithm  (continued 

on  following  pages) 


T 


do  foi  P  ♦—  0  to  —7=  -  1 

v/N 


do  distance  ♦—  tOOOOOOO 
for  i  I  to  c 

do  eudist  *—  0 

for  j  I  (o  !) 

do  eudist  «—  eudist  +  (\j-iii|j)»(\ j-iii 
end 

if  eudist  <  distance  then  class  «—  i 

distance  <—  eudist 
end 


for  i  <—  1  to  n 


^  i  —  +  Xi 


nunil)er  (class)  ♦-  number  (class)  +  1 

if  (pciass  (k,P).NE.  class)  pchange  *—  pchange  +1 

pciass  (k,P)  <—  class 


/*  Use  recursive  doubling;  algorithm  to  merge  the  elements  in  vectors  Mj,  i  =  l 

and  array  number  (i).  i  =  l . c  from  each  I’l'. 

h  =  lognN 

After  merging,  the  results  will  be  stored  in  each  PIO.  */ 


for  i  ♦-  0  to  h  -  1 
do 

DTRi„  DATA  to  be  added 
TRANSFER  using  Cube; 
DATA  -  DATA  +  DTI?,„„ 
end 


Fig.  3.3.2.  (continued) 
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/*  Compute  the  new  cluster  means.  M,,  */ 

for  i  I  to  c 

do  for  j  1  lo  n 

do  iTiij  ♦—  iiiij/numher  (i) 

end 

end 

/*  ('ompute  the  percentage  of  class  change  for  the  whole  image  */ 

/*  I  se  recursive  doiihling  algorithm  to  merge  |)change  in  each  I’K. 

After  merging,  the  result  will  be  stored  in  IMC  0.  */ 

for  i  «—  0  to  h  -  I 

do  MASK  [X"  ‘  'I  X'l 
DTI? in  ^  i)change 
THANSFIvl?  using  Cubci 
MASK  (X"  ‘  *0  X‘I 
pchange  <—  pchange  +  DTR^^ 

<’nd 

MASK  10“! 

pchange  •►-  ([)chang<7(M*M))  *  100 

I*  If  pchange  is  less  than  a  prespecified  threshold,  then  the  clustering 
is  complete.  Compute  the  separability  information  (not  shown  here); 
otherwise  .set  Mj  =  Mj,  i  =  l,2,...,c.  and 
return  to  the  beginning  of  this  algorithm.  */ 

/*  To  test  if  pchange  is  less  than  t  or  not,  only  PE  0  is  enabled.  */ 

if  pchange  <  t  then  compute  the  separability 
else  Mj  Mj,  i-<,‘2,...,c 

go  to  the  beginning  of  this  algorithm. 


Fig.  3.3.2.  (continued) 


IIQ 

of  pixels  to  cluster  centers  is  performed  in  |):irall(>l,  so  (hat  each  IM]  contains 
the  result  of  a  clustering  iteration  for  the  suhimage  which  it  holds.  'I'o 
compute  the  new  cluster  means,  these  local  results  are  then  comhined  using  a 
form  of  overlapped  recursive  doubling. 

The  exact  data  allocation  scheme  used  for  this  algorithm  is  not  critical. 
The  only  r<‘quirement  is  that  l/N  of  the  image  elements  be  assigned  to  each 
PK. 

For  simplicity,  we  shall  assume  a  scenario  with  monochrome  imagery,  so 
that  the  number  of  features  per  pixel  (n)  is  I  for  this  algorithm.  Then  on  each 
iteration,  the  distance  calculation  and  determination  of  cluster  membership 
requires  (3  additions  +  c  subtractions  +  (c+1)  comparisons  +  c 

m2 

multiplications)  *  operations  for  each  PP.  The  recursive  doubling 

algorithm  to  inerge  the  means  Mj,  and  constants  number(i),  i  =  l,...,c  in  each  IMC 
requires  2c  parallel  transfers  and  add  operations  at  each  step.  Since  there  are 
log2N  steps  to  merge  the  data,  a  total  of  2clog2N  parallel  transfers  and 
additions  is  required.  After  merging  the  data,  c  division  opersitions  are  needed 
to  compute  the  new  cluster  centers.  Using  the  recursive  doubling  algorithm  to 
merge  pchange  from  each  PP  requires  log2N  transfers  and  additions,  and, 
2log2N  masking  operations.  Computing  the  percentage  of  class  change  requires 
(2  multiplications  4-  1  division)  operations  and  1  mask  operation.  Finally,  to 
test  if  pchange  is  less  than  a  prespecified  threshold  t  requires  1  comparison; 
only  PE  0  is  enabled  at  this  step. 
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3.3.3  Performance  Analysis 

f  or  application  of  the  performance  measures  to  the  clustering  algorithm, 
the  following  are  assumed: 

tj  =  time  for  1  integer  addition  operation, 

tg  =  time  for  1  integer  subtraction  operation, 

t^  =  time  for  1  masking  operation, 

tp  =  time  for  1  integer  comparison  operation, 

ty  =  time  for  I  integer  multiplication  operation, 

tj  =  time  for  I  integer  division  operation, 

tJN)  =  time  for  I  parallel  data  transfer  operation. 

For  simplicity  in  the  example  that  follows,  it  will  also  be  assumed  that 
tj  =  tj  =  t,.  =  ty/2  =  ij2  -  2t^  =tt(N)/2.  The  actual  relationships  among 
these  operations  will  be  implementation  dependent.  Time  for  loading  and 
storing  data  items  and  for  program  control  operations  such  as  testing  loop 
indices  has  not  been  explicitly  included  in  the  analysis.  The  time  required  for 
program  control  will  be  a.ssumed  negligible  in  comparison  to  the  other  times, 
and,  in  general,  the  program  control  operations  can  be  overlapped  with  the  PE 
and  inter-f’E  transfer  operations.  As  shown  earlier,  the  analysis  could  be 
modified  to  account  for  implicit  operations. 

I'he  performance  of  (he  clustering  algorithm  will  be  evaluated  as  a 
function  of  .\  for  an  .M-by-M  image,  with  M  ranging  from  2®  =  01  to 
2*®  =  8102.  The  evaluation  is  limited  to  machines  having  between  2®  and  2*‘* 
l’l']s  and  the  number  of  clusters  being  32. 
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Thf  soqupnfial  algorilhni  (o  <|o  ihr  clu^foririg  rcciiiiros  M"  [3  additions  +  c 
subtractions  +  (c  +  l|  comparisons  +  c  multiplications]  +  c  divisions  +  1 
comparison.  Thus,  the  serial  execution  time  (one  processor,  M“  pixels)  is  given 
by  T|(M“)  =  M“  [3  additions  +  c  subtractions  +  ((“Tl)  comparisons  +  c 
multiplications)  +  c  divisions  +  I  comparison  = 
M“  [3*t^  +  c*tj,  +  (c  +  I  )*t,.  +  c»ty|  +  c*t j  +  1,.. 

The  following  performance  criteria  applied  to  the  clustering  algorithm  are 
discussed  in  [29|. 

(a)  Execution  Time  (N  processors,  M"  pixels  ); 

Tn(M‘)  =  ^  l3*tj  +  c*t,  +  (c  +  l)*t^  +  c*tj,j  +  (c  +  l)*td  +  t,+(2c  +  l)log2N*t^  +  2*tj, 


The  execution  time  can  be  expressed  as  the  sutn  of  two  components, 
computation  time  and  overhead  dtie  to  parallelism.  The  computation  time  is 
given  by 

\  #2 

Cn(M^)  =  ^  [3*t3  +  c*tj  +  (c  +  l)*t,.  +  c*ty|+(c  +  l)*tj+t,.  +  (2c  +  l)logoN*t^  +  2*ty 

The  overhead  is  given  by 

C);s,(M-)  =  (2c  +  l)iog._,N*t,(.\)+(1 +‘2log.jN)*t„, 

The  serial  execution  time  is  T,(M").  Figure  3.3.3  shows  tin*  logo  of  the 
execution  timt  as  a  function  of  N  and  M  under  the  simplifying  assumptions 
outlined  above.  The  grai)h  has  been  norm.ilized  to  t,,  "  I.  I’or  larg('  images 
(e.g.,  M  >  1021).  it  is  clear  that  for  given  M  the  excciiijun  lime  decreases  as  X 
increases,  lor  such  M,  if  N  is  doiibletl.  then  the  execniidii  lime  is  decreased  bv 


approximately  a  factor  of  two.  For  .small  images,  an  increase  in  the  number  of 
PEs  has  little  effect  on  the  execution  time.  For  such  M.  the  time  required  for 
the  recursive  doubling  algorithm  dominates  when  N  is  large,  and  there  is  little 
or  no  advantage  to  using  a  large  machine.  F»)r  each  M  in  the  range  61  to  12S, 
the  rate  at  which  T|v;(M')  decrea.ses  (dTM(M")/dN)  also  de<rea.ses  as  N 
increases.  Therefore,  for  practical  purposes  it  may  be  appropriate  to  consider 
dTp^'(M“)/di\  as  well  as  Tjgl.M"). 

(b)  Speed-Up: 

Sn(M2)  =  T,(M2)/Tn(M-) 

Figure  3.3.4  shows  the  log.)  of  the  speed-up  as  a  function  of  N  and  M,  under 
the  simplifying  assumptions.  For  large  M,  Sn(M")  ~  N,  i.e.,  the  speed-up  is 
almost  ideal  for  all  N  considered.  Therefore,  using  Sjvj(M")  as  the  performance 
criterion  wouid  dictate  using  as  many  PEs  as  are  available.  For  small  M, 
Sjvj{M^)  «  N,  and  the  choice  of  N  has  little  effect  on  the  speed-up.  For 
example.  f(jr  M  =  64,  there  is  little  advantage  to  using  more  than  2®  =  ■')I2 
PEs.  For  small  M,  there  exists  a  value  for  N  up  to  which  increasing  the 
number  of  PEs  significantly  increa.ses  the  speed-up,  but  beyond  which  there  is 
little  advantage  to  increasing  N.  Thus,  it  appears  that  using  a  combination  of 
speed-up  with  d(speed-up)/dN  (and/or  a  me.asure  such  as  utilization  or 
efficiency  -  L;ee  below)  is  a  more  practical  criterion  than  s|)eed-ui)  -•lone. 


(c)  Efficiency: 


En(M2)  =  Sn(M2)/N 

=  T,(M2)/(N*T^(M-|) 

Figure  3.3.5  shows  tlie  efficiency  for  ;i  range  of  values  of  M,  under  the 
simplifying  assumptions.  .4s  required  by  the  definition,  0  <  <  I.  For 

all  M,  the  efficiency  decreases  as  N  increases,  but  the  rate  of  decre.-ise  is  slower 
for  large  M.  For  large  VI,  the  efficiency  is  high  for  all  N  considered,  and  the 
choice  of  N  does  not  significantly  affect  Ei^(M").  For  small  M,  E>;(M*)  is  small 
for  large  N.  The  conclusion  that  tlx*  elliciency  is  poor  fi>r  the  choices  of  l.irge 
N  and  small  M  is  consistent  with  the  information  from  the  exi'ciition  time 
measure.  From  the  observation  that  IC;v;(M")  is  higher  for  large  M,  it  can  be 
concluded  that  for  a  fixed  N,  there  may  be  no  advantage  to  decomposing  a  size 
MxM  problem  into  smaller  subproblems,  even  if  the'  result  can  later  be 
recombined  at  low  cost. 

(d)  Utilization; 

=  y:'t^F,/(N*T^j(NU) 

x=0 

where  t^  =  time  perform  step  x 

=  no.  of  I’l'.'s  .active  for  step  x 

X  =  no.  of  steps  in  the  N  I’l!  <  om|iiiiat  ion 

\  I 

(1'he  computation  time  cv^lM")  “  V  ij. 

<  =  o 


To  derive  the  utilization  requires  counting  the  number  of  I’Es  activ<‘  for  each 
computation  step.  In  the  stage  of  merging  the  |)change  in  each  I’E.  log.>.N  st(>ps 


of  recursive  doubling  are  used.  Af  step  i,  1  <  i  <  logoN.  the  number  of  F’Ks 
performing  the  additions  is  N/“2‘.  Summing  over  the  log._,N  steps  gives  In 

the  stage  of  testing  wliether  peliange  is  less  than  t  or  not.  only  I’lO  0  is  eii.ibh'd. 
For  the  rest  of  tlie  computation  operations,  all  .\'  I’lOs  are  enabled.  Figure 
3.3.6  shows  the  utilization  under  the  simplifying  assumptions.  In  this  example, 
because  most  eomputation  steps  use  all  N  FHs, 

V  tj\  ^  N  *  Cn{M-)  . 

Therefore,  here  ~  Cf,;(M“)/'r;vi(M-). 

Efficiency  is  directly  affected  by  both  overhead  and  utilization.  .\s  oik' 
may  expect,  efficiency  will  decrease  as  overhead  increa.ses  and/or  utilization 
decreases.  If,  for  a  given  set  of  parameters,  elliciency  is  low,  theti  the  overhead 
and  utilization  may  be  examined  to  determine  factors  contributing  to  the  low 
efTiciency.  For  the  clustering  algorithm,  both  overhead  and  utilization  cause 
efficiency  to  decrease  as  N  increases. 

(e)  Price:  The  price  for  the  clustering  example  is 
Pn(M2)  =  Pt*TN(M2)+IV[N*FVE  +  N*ig 

where  I\  =  cost  of  a  unit  of  execution  time 
Pj>f;=  cost  of  a  I’FC 
F’j.  =  cost  of  a  iK'twork  switch 
I’j  =  relates  total  implementation  cost  to 
hardware  costs 

A.ssuming  a  single  stage  cube  network  is  used,  the  number  of  switches  is  N. 
Figure  3.3.7  shows  the  logo  <»f  the  price  under  the  simplifying  a.ssumptions  plus 


l.{0 

(he  assumptions  that  l\  =  I*j  =  1,  and  =  .32*  1’^  =  1.  For  small  M,  the 
curve  has  a  minimum  in  the  range  2**  <  N  <  2'*.  Furthermore,  the  optimal  N 
is  great(‘r  for  large  images  than  for  small  images.  1'his  occurs  because  for  large 
image'',  exectition  time  continues  to  (hvrease  significantly  as  N  increases,  while 
for  smaller  images,  the  rate  at  which  'l\(M“)  decreases  falls  off  for  large  N, 

I'he  generalized  price  for  the  clustering  algorithm,  under  the  same 
assumptions,  is  given  by 

=  -^♦Pt*'I\.(M“)  +  ^*Pi*lN*IVE  +  N*PJ 

o  +  I  o  T  I 

where  o  expresses  the  relative  importance  of  execution  time  versus  system  cost. 
Figure  .3.3.S  shows  the  gi'ticralized  price  as  a  function  of  N  and  o  for  a 
I02lxl021  image.  As  expected,  the  optimal  value  of  N  shifts  to  the  right  as 
execution  time  becomes  more  critical  than  system  cost  and  to  the  left  when 
system  cost  dominates  (r><l). 


.3,4  Parallel  Architecture 

From  the  discussion  of  the  previous  three  sections,  a  multiprocessor  system 
which  can  be  dynamically  reconfigured  as  one  or  more  independent 
SfMD/MlMD  submachines  of  different  sizes  is  needed  to  implement  the 
supervised  relaxation  algorithm  In  this  section,  the  partitioning  of  the 
interconnection  networks,  culie  network  and  augiiH'iitcd  data  manipulator 
network,  is  considered,  'I'ypically,  when  the  number  of  jirocessing  eh'ments  in 
(lie  multiprocessor  system  increases,  the  ilata  communications  bi'tween  the  PlCs 


l)ecomi‘  more  and  more  important.  Ilspecially  for  multiple  submachines  of 
SIMI)  and  MI.MI)  modes,  it  is  reipiired  that  different  submachines  can 
''imu It .iiK'oiisIy  exi'(iite  their  instructions  and  do  not  interfere  with  each  other. 


liut,  th^'y  still  can  cooperate  with  each  other  in  the  results  through  the 
interconnection  network  after  certain  stages  of  computations.  The  routing 
schemes  presented  below  for  both  cube  network  and  ADM  network  and  hybrid 
type  network  are  (juoted  from  (ib.lT.IS.It)].  The  network  adopted  is  so  well 
suited  to  the  supervised  relaxation  algorithm  that  substantial  speedup  by  the 
multiprocessor  systiun  is  expected. 

3.4.1  Hybrid  .N'etwork 

As  shown  in  Fig.  d.1.1,  the  hybrid  cube  network  is  an  integration  of  two 
networks,  the  unidirectional  packet-switched  cube  network  and  the  bi¬ 
directional  circuit-sw itclual  cube  network.  This  network  is  suitable  for  the 
parallelism  modes  configured  as  shown  in  Fig.  3.2.1  because  large  blocks  of 
input  data  can  be  pipelined  through  the  lower  half  network  and  fast  data  and 
instruction  fetches  can  be  provided  by  the  upper  half  network.  The  upper  half 
of  (he  hybrid  cube  network  is  a  PI>to-Fl>  configuration  in  which  the  cube 
network  is  wrapjied  around  an<l  connected  to  N  FlCs,  I'here  is  a  local  memory 
module  associated  with  each  IMv  'fhe  lower  half  of  (he  hybrid  cube  network  is 
a  !’!•:  -to-memory  configuration  in  which  N  IMvs  are  connected  on  one  side  and 
.NI  memory  modules  are  connected  on  the  other  side.  The  memory  spaces  in  the 
.\'  memory  modules  shared  by  (he  \  I’Ks  are  much  larger  than  the  local 
memory  modules  in  the  N  lM]s.  'i'he  interchange  boxes  at  the  mth  stage 
(m  =^log.,.\)  (an  be  set  to  connect  to  the  upper  half  or  (he  lower  half  of  the 
liybrid  cube  network  by  setting  (he  interchange  boxi's  to  straight  or  exchange; 
they  each  have  one  input  port  connected  to  a  IMvs. 

The  advantage  of  the  I’Ivto-IMC  configuration  is  fast  local  memory 
aece.sses  For  .SIMI)  mode,  data  are  stored  in  the  local  memory.  Therefore. 
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this  configuration  can  provide  fast  instruction  and  data  fetches.  For  the  PE- 
tonieniory  configuration,  N  PEs  share  large  blocks  of  data  stored  in  N  memory 
modules  and  can  vary  the  amount  of  memory  used  by  each  processing  element. 

For  the  packet-switched  cube  network,  a  packet  makes  its  way  from  stage 
to  stage,  releasing  links  atid  ititerchatige  bo.ves  immediately  after  using  them. 
It  is  good  for  MIMI)  mode  which  needs  a  fre<iuently  changing  path  through  the 
network  when  performing  window-type  operations.  Therefore,  every  PE  can 
request  data  front  a  neighboring  PE  by  sending  a  message  via  the 
interconnection  tietwork  and  the  reqiK'sted  I’E  can  respond  accordingly.  As  a 
result,  it  can  retluce  contention  in<-urred  by  sending  two  packets  to  the  same 
input  port  of  the  interchange  box  or  dispatching  two  packets  from  the  same 
out|)ut  port  of  the  interchange  box.  For  the  circuit-switched  cube  network,  a 
comj)lete  circuit  is  established  from  input  port  to  output  port  for  one  particular 
path.  It  is  good  for  SIMI)  mode  which  can  provide  parallel  data  transfers  via 
the  interconnection  network  and  al.so  good  for  transferring  large  blocks  of  data 
from  the  shared  memory  modules  to  PEs.  Therefore,  data  transfers  can  be 
pipelined  through  the  network.  Once  this  path  is  established,  the  only  delay  is 
jiropagation  delay. 

The  u|)per  half  of  the  hybrid  <ube  network  is  an  unidirectional  network. 
•Since  the  inputs  and  outputs  of  the  network  are  connected  to  PEs,  the 
unidirectional  network  is  sunicieiit.  For  the  lower  half  of  the  hybrid  cube 
network,  sinci-  large  blocks  of  data  need  to  be  transferred  between  PlCs  and 
memor>  inoduli's,  a  biilirect ional  m-twork  is  necessary  to  provide  this  facility. 

.Actualh.  the  hybrid  ( ube  network  contaitis  N  input  ports,  2N  outiiut 
ports  and  two  size  N  generalized  cube  networks.  It  has  m+1  stages  labeled 
from  m  to  0.  Intercliange  boxes  in  stage  m  divide  the  network  into  two  halves. 
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That  is,  the  interchange  boxes  can  be  set  to  connect  to  either  the  upper  half  or 
lower  half  of  the  network. 

If  the  cube  network  is  replaced  b\  the  .M)M  network,  then  (he  h\l)rid 
cube  network  be(•oIne^  (he  hybrid  \l).\l  network.  The  ADM  netwdik  i>  more 
powerful  than  the  cube  network.  Tor  the  multistage'  eiibe  network,  there  i^ 
only  one  path  between  a  particular  network  input  and  output.  Dut  for  the' 
multistage  ADM  network,  there  are  multiple  paths  between  a  give'ii  network 
input  and  output.  Thus,  (he  hybrid  ADM  network  can  reroute  the  packets 
through  another  path  if  the  current  established  path  is  brok('n  or  has  busy 
switching  elements  in  it.  It  can  perform  any  i)ermut;it ion  that  a  multistage 
cube  network  can  perform.  However,  in  addition  to  these  advantages  for  the 
hybrid  /VDM  network,  there  are  also  additional  implementation  costs  and 
control  complexity  that  the  hybrid  cube  network  does  not  have. 

If  (he  multiproce.ssor  system  with  the  hybrid  network  is  used  to  impleim'iit 
the  supervised  relaxation  algorithm,  the  data  inputs  to  the  subalgorithms  in 
blocks  .\,  M,  and  C'  can  be  stored  in  the  memory  modules  lirst.  'FIk'  input  d.it.i 
tend  to  be  large.  Therefore,  the  memory  modules  can  provide  large  storagi' 
space  for  them  and  data  can  be  retrieved  on  demand.  .After  using  tlu'm,  results 
produced  can  be  transferred  back  to  memory  modules  in  order  to  save  memory 
spaces  in  Dl']s  for  other  purposes. 

3. 1  2  Partitioning  the  Cube  Network  into  2  .MlMDs  and  I  SIMD 

Partitioning  and  riuiting  schetnes  for  the  cube  network  are  di'scribed  in  (In* 


Appendix. 


i.'it) 

For  simplicity,  assiinu*  there  .ire  \  =  Jti  IM's.  We  \vai)t  to  partition  these 
FHs  into  2  MIMI)  snhmaehiiies  an<l  I  SIMI)  submachine  for  the 
implementation  of  the  supervised  relaxation  algorithm  in  this  multiprocessor 
system.  One  MIMI)  has  8  I’lOs  and  the  other  MIMI)  has  2  Pi's.  The  SIMI)  has 
1  PEs  plus  one  PE  for  the  control  unit.  Thus  a  total  of  15  PEs  have  been 
used.  Now,  the  remaining  probhun  is  how  to  partition  the  network  into  2 
MlMDs  and  one  SIMI)  with  the  control  unit  associated  with  it.  If  the  network 
can  be  partitioned  into  thesi'  three  independent  groups,  then  the  PEs  connected 
to  this  network  are  automatically  partitioned  into  three  corresponding 
independent  submachines. 

For  MIMI)  mode  with  size  8,  the  addresses  of  these  8  ports  must  differ  in  3 
bit  positions.  For  example,  if  port  addresses  from  0  to  7  are  assigned  to  the 
MIMI)  machine,  their  addresses  will  all  agree  in  only  the  most  significant  bit 
position.  Ily  setting  to  straight  the  interchange  boxes  i;i  stage  3  corresponding 
to  the  input  port  addresses  ranging  from  0  to  7,  one  MIMD  with  size  8  is 
formed.  Similarly,  for  MIMD  mode  with  size  2,  the  addresses  of  these  two 
must  differ  in  1  bit  position.  Let  us  say  that  port  addresses,  10  and  11,  form 
one  MIMI).  Therefore,  two  addresses  will  agree  in  the  upper  3  bit  positions. 
Hy  selling  lo  straight  the  inlerchange  l)oxes  in  stages  ,3  2,  and  1  corresponding 
lo  ihcNC  two  port  .iddn  ss<>,  one  indi'pendenf  Ml.Ml)  with  size  2  is  formed.  For 
SIMI).  if  port  address  8  is  selected  as  the  control  unit  and  port  addresses 
ranging  from  12  to  15  are  selected  for  the  processing  PEs,  then  these  latter  4 
addresses  agree  in  the  most  significant  bit  position.  By  setting  to  straight  the 
Interchange  boxes  in  stage  3  corri-sponding  to  port  addre.sses  from  12  to  15, 
these  I  addressi's  form  one  inde|)endent  group.  PE  8  can  broadcast 
instructions  to  lluse  I  by  calculating  the  routing  tag  as  follows; 


R=S  ©  I)i=8  ®  12  =  1000  ©  1100=0100 

and 

B=l)i©l)j=i2©  ir,=noo©  1111=0011 

(the  notation  is  defined  in  the  Appendix).  For  I’lO  S  hy  setlitiK  •I'*'  interehanKe 
boxes  in  stage  3,  2.  1,  and  0  to  straight,  exehangc*.  broadcast,  and  l)roadcast.  it 
can  broadcast  instructions  to  l’l']s  12,  13,  11,  and  13.  The  result  is  shown  in 
Fig.  3.1.2. 

3.4.3  Partitioning  the  ADM  Network  into  2  MIMDs  and  1  SIMI) 

Since  the  determination  of  routing  tags  can  be  found  in  [10.17],  this 
section  only  overviews  the  advantages  of  the  ADM  network.  Most  important  is 
the  description  of  the  partitioning  of  the  .\DM  network  into  2  MIMDs  and  1 

SIMI). 

The  advantages  of  the  ADM  network  as  de.scribed  before  are  the  multiple 
paths  existing  between  a  network  input  and  output,  and  the  additional 
permutations,  as  compared  to  the  cube  network,  which  the  ADM  network  can 
perform. 

The  ADM  network  consists  of  N  input  ports,  N  outj)ut  ports,  and 
m  =  log2N  stages  with  N  switching  elements  per  stage.  Stages  are  nutnbered 
from  m-1  to  0.  Each  switching  element  in  stage  i  performs  a  straight 
connection  and  the  PM2I  (plus-minus  2')  interconnection  function  which  is 
defined  as 

l’M+i(j)  =  (j  +  2')  tnod  N 

I’M  i(j)  -  (j  “  2')  mod  N 

for  0  <  j  <  N  and  0  <  i  <  m 


Therefore,  each  node  j  at  stage  i  of  the  network  has  three  output  lines.  Hut, 


there  are  only  two  distinct  data  paths  from  each  node  in  stage  m-1  [17],  The 
ADM  network  is  shown  in  Fig.  3.1..'$.  Tlie  individual  switching  elements  in 


every  stage  can  be  controlled  inde|>endently  and  routing  tags  are  employ «(l  to 
distribute  control  of  the  ni'twork  among  the  pnues.sors  to  avoid  the  bottleneck 
created  by  the  centralized  control  unit. 

/Vssume  there  are  .\  =  lb  IMOs  connected  to  the  .ADM  network.  The 
network  can  be  partitioned  into  one  SIMD  machine  with  size  2,  one  MIMD 
machine  with  size  1,  and  oih'  MI.MD  machine  with  size  2  as  shown  in  I'ig.  .'{.  I.l. 
For  the  MIMD  machine  of  size  I,  if  the  port  addn-sses  include  I,  3,  fl,  and  !.■$, 
set  the  corresponding  switching  elements  in  stages  0  and  1  to  straight.  For 
MIMD  machine  of  size  2,  if  this  group  includes  ports  7  and  15,  set  the 
corresponding  switching  elements  in  stages  0,  I,  and  2  to  straight.  The 
principle  of  partitioning  is  that  the  size  of  each  subnetwork  must  be  a  power  of 
2  and  the  addresses  of  the  input  ports  in  the  subnetwork  of  size  2""  must  agree 
in  their  lower  order  m-s  bit  positions.  For  SIMD  mode,  port  II  acts  as  the 
control  unit  to  broadcast  instructions  to  ports  2  and  1.  1'he  routing  tag 
contains  two  parts:  {R,B}  =  {11001,  10001}.  In  [16,17],  calculation  of  the 
routing  tags  is  described.  Therefore,  F’B  II  sets  the  switching  elements  in 
successive  stages  to  be  -2^,  straight,  straight.  2'-type  broadcast.  Thus,  it  can 
broadcast  instructions  to  I*I]s  2  and  1.  The  2  MIMD  modes  and  I  SIMD  mode 
in  the  independent  subnetworks  shown  in  Fig.  3.1.1  have  the  complete 
interconnection  capabilities  of  the  ADM  network. 

The  important  ability  of  this  network  is  that  it  can  dynamically  reroute 
the  message  through  alternate  available  p.aths  to  gain  fault  tolerance  as  well  .as 
improved  thrtuighput  [17]. 


MIMD  =  {1,5,9,13} 
MIMD  =  {7,15} 
SIMD  =  {2,4} 


I  ip;  .{  (  I  \I)\I  Ni'lwork  willi  *J  \ll\ll)s  .iii'l  I  S|\ll) 


3.5  Sutntiiary  and  Conclusions 


In  Section  3.1,  it  was  shown  that  the  inaximuin  parallelism  achieved  at 
this  level  of  modeling  is  reduced  by  the  nature  of  the  algorithm  and  especially 
l)y  the  scalar  proce.sses  involved  in  the  testing  of  the  loop  despite  the 
availability  of  1021  Pl'].s.  It  also  was  shown  that  the  average  parallelism  h, 
which  accounts  for  both  concurrency  of  processing  elemeins  and  for  the  overlap 
of  array  and  scalar  or  control-flow  instructions,  is  usually  larger  than  the 

-r 

average  vector  parallelism  h  which  only  accounts  for  the  concurrency  of 
processing  elements  and  doesn’t  consider  the  overlap  of  array  and  scalar  or 
control-flow  instructions.  If  the  number  of  processing  elements  is  relatively 
small,  the  overlap  of  the  control  unit  with  the  array  proce.ssors  may  be 
significant.  Otherwise,  the  overlap  of  the  control  unit  with  the  array  processors 
contributes  very  little  to  the  degree  of  parallelism. 

An  interesting  result  of  the  analyses  in  this  section  was  that  the  total  time 
required  to  classify  an  image  using  a  pipeline  is  greater  than  that  using  an 
SiMI)  architecture  even  though  tin*  utilization  of  the  I’lCs  is  greater  for  the 
|)ipeliiu‘  'I'he  reason  is  that  the  overhead  due  to  implementing  the  |)arallelism 
for  the  pipeline  is  greater,  d'he  ov<>rhead  inchides  scalar  operations  executed  at 
each  stage  of  the  pipeline  and  data  transfers  between  stages  of  the  pipeline.  In 
Sl.MIt  mode.  I’I'>  address,  and  general  masking  operations  cost  nothing  in  the 
total  algorithm  run  lime  if  we  .issiime  that  masking  opi-ralions  are  dc'coded  in 
the  control  unit.  'Phis  jiroperty  is  .iKo  modeh-d  with  S-Nets.  d’lu'refore.  the 
transition  firing  lime  is  fiot  incre.-ised  lo  .accommodate  mask  m.arkings  Ih.al  are 
not  all  I’s.  Data  Iransb-r  lime  .and  d.at.a  Ir.ansfeps  through  the  interconnection 
network  can  .also  be  modeled  with  S-Nets,  such  as  tlu'  d.ala  transfers  in  vector 
sum  exam|)h'  Mased  on  the  S-Net  modcd,  system  throughput  can  be  increased 


by  maximizing  the  h  and  h  measures. 

Also  in  this  section,  we  modeled  a  complex  algorithm  (maximum  likelihood 
classification)  with  S-Nets  on  SIMl)  and  piix'linc  architectures.  Some 
quantitative  measures  such  as  parallelism  and  execution  time  were  considered 
in  conjunction  with  S-Nets.  Due  to  the  availability  of  quantitative  measures, 
we  can  make  direct  comparisons  between  two  architectures  based  on  a  given 
image  size  and  number  of  PEs.  In  general,  for  image  proce.ssing  operations 
which  are  not  window-type  operations,  the  higher  the  dimensionality  of  the 
remote  sensing  data  and  the  more  classes  rejiresentc'd  in  the  image,  tin'  greater 
the  potential  benefits  to  be  derived  from  SIMl)  implementation  of  the  proci'ss. 

Section  3.2  discussed  the  comparison  between  exc'cution  time’s  of  SIMl) 
and  MIMD  processors  based  on  the  analysis  of  explicit  and  implicit  operations 
embedded  in  an  algorithm.  In  Section  3.2.1  it  was  di'inonst ratc'd  that  the’ 
MIMD  mode  is  better  than  SIMl)  mode  for  the  algorithms  which  are  not 
suitable  for  lockst(’i)  opi'ration.  Of  course.  MINI!)  mode  needs  ,i  more' 
complicated  control  str.itegy  and  has  more*  complic.ited  methods  fur  d.ila 
transfers  than  SIMl)  mode.  In  SIMl)  mode,  every  step  is  synchroni/ed  ,ind  the 
control  strategy  is  simpler.  In  general,  algorithms  which  have  window-type 
operations  and  do  not  have  operators  whose  (xecution  times  an'  argument- 
dependent  are  suitabU’  for  SI.MD  mode  Ix’causi'  tlu’y  can  take  ad\ .iiil.ige  of 
overlapping  of  scalar  and  array  opc'ratioiis,  paralh’l  dat;i  t r.-itisfi'r''.  s\  ih  hrotii/cd 
operation,  and  simpler  control  schenu’s  without  loosing  the  bettc-r  ut ili/.at ion  of 
FP]  resources.  Algorithms  which  d<»  not  have  w indow-ty jx'  operations  ,an  I  h.ive 
operators  whose  execution  times  are  argument-di'pendent  an-  suitable  for 
MIMD  mode  because  thi’y  can  avoid  the  interrupt  ojx’rations  for  asynchronous 
data  transb’rs  and  dc’adlock  pn’vi’ntioii  scheme’s,  aehie’ve'  Ix’lter  iitili/.ition  of 
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PE  processing  power,  and  minimize  execution  times.  For  algorithms  which 
have  window-type  operations  and  are  suitable  for  MIMD  mode,  data  transfers 
can  be  executed  in  parallel  before  performing  any  data  computation  or  can  be 
executed  in  a  more  general  fashion  though  interrupt  operations  provided  that 
the  complicated  control  scheme  of  the  system  is  available.  We  assume  that 
asynchronous  data  transfers  through  interrupt  operations  and  control  schemes 
can  be  supported  by  the  hardwarc/software. 

The  supervised  relaxation  algorithm  contains  several  subalgorithms,  some 
of  which  have  execution  times  that  are  argument-dependent.  These 
subalgoritlims  are  suitable  for  xMIMD  mode  in  order  to  most  efficiently  utilize 
the  proce.ssing  power  of  PEs  and  minimize  execution  times.  On  the  other 
hand,  subalgorithms  with  execution  times  which  are  noi  argument-dependent 
are  most  suitable  for  SIMD  mode  in  order  to  take  advantage  of  the  simpler 
control  strategy  and  lockstep  operations.  PASM  [26,27],  a  partition  able, 
reconfigurable  SIMD/MIMD  multimicroprocessor  system,  offers  an  interesting 
environment  in  which  to  implement  the  supervised  relaxation  algorithm.  It  can 
be  configured  a.s  one  SIMI)  submachine  and  two  MIMD  submachines  of 
dilferent  sizes  for  (he  earlier  processing  stages  and  later  can  be  reconfigured  as 
an  SIMD  machine  with  maximum  PEs  in  order  to  avoid  a  bottleneck  due  to  a 
|)articularly  complex  operati('n.  From  Tal>Ie  3.2.3,  it  was  seen  that  the 
speedup  of  both  SIMD  and  MIMD  submachines  will  be  close  to  (M/n)^,  which  is 
(he  number  of  Phis,  if  (he  subimage  size  is  large  enough  so  that  the  overhead 
caused  by  parallelism  is  not  a  dominant  factor. 

In  Section  3.3,  analysis  of  a  representative  algorithm  (clustering) 
emphasized  system  i)erformance  as  a  function  of  problem  size  and  system  size. 
.Although  (he  system  performances  are  for  SIMD  mode,  it  can  be  generalized  to 
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MIMD  mode  as  long  as  data  transfers  are  executed  in  parallel  before 
performing  data  computation,  and  the  computation  time  for  MIMD  mode  must 
take  into  account  scalar  operations  such  as  loop  counting,  branching,  and 
subroutine  calls  and  returns.  The  other  processing  environment  of  MIMD  mode 
is  the  same  as  that  described  in  Section  3.2.  We  recall  that  the  subalgorithms 
in  blocks  A,  B,  and  C  as  shown  in  Fig.  3.2.1  can  be  executed  independently 
and  simultaneously  because  the  input  data  and  results  of  these  three 
subalgorithms  are  independent  of  each  other.  For  systems  aimed  at  real-time 
processing,  it  is  best  that  these  subalgorithms  produce  their  results  almost  at 
the  same  time  to  provide  input  data  for  block  D.  Ba-sed  on  the  “execution 
time”  measure,  we  can  assign  different  numbers  of  processing  elements  for 
subalgorithms  in  blocks  A,  B,  and  C  to  synchronize  their  outputs  and  minimize 
the  execution  time.  But  by  considering  execution  time  alone,  it  is  possible  to 
select  the  number  of  PEs  such  that  the  marginal  improvement  in  performance 
is  very  small  or,  conversely,  such  that  significant  improvements  may  be 
sacrificed.  The  execution  time  does  not  directly  address  issues  related  to  how 
effectively  the  .system  resources  are  being  used.  In  general  the  “speed-up”, 
“efficiency”,  and  “utilization”  me;i.sHres  used  together  can  achieve  better 
utilization  of  PE  resources  and  synchronize  the  outputs.  By  considering  the 
total  number  of  I’Es  assigned  to  blocks  A,  B,  and  C  as  system  cost  and  the 
required  execution  time  to  synchronize  the  outputs  as  execution-time  cost,  the 
“price”  and  “generalized  price”  measures  can  provide  a  means  to  choose  the 
number  of  PlCs  which  seems  to  be  a  compromise  between  execution  time  and 
system  cost. 

In  Section  3.1,  we  considered  two  interconnection  networks  which  can 
provide  the  communicaticin  among  the  Pi's  in  the  multiprocessor  system  and 
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CIL\PTER  4  -  CONCLUSIONS  AND  SUGGESTIONS 


I 

The  supervised  relaxation  operator  was  demonstrated  successfully  as  a 
mechanism  for  combining  the  information  from  multiple  ancillary  data  sources 
with  the  information  from  spectral  data  and  spatial  context  in  classifying  of 
multispcctral  imagery.  In  Chapter  2,  the  method  was  described  and  the 
convergence  of  the  iterative  algorithm  was  established.  The  final  labeling 
results  from  convergence  of  evidence,  reaching  consistent  labeling.  The  key 
inputs  to  the  supervised  relaxation  operator  are  a  quantitative  characterization 
of  initial  probabilities  computed  from  the  spectral  data  and  conditional 
probabilities  computed  from  the  contextual  information.  The  initial 
probabilities  were  obtained  in  the  penultimate  step  of  the  maximum  likelihood 
classification  using  spectral  data;  the  conditional  probabilities  were  calculated 
from  another  source  of  data  known  to  be  correct.  The  experimental  results 
showed  that  the  performance  of  the  method  using  spatial  contextual 
information  was  only  slightly  better  than  that  obtained  from  the  maximum 
likelihood  classifier;  the  performance  with  one  ancillary  information  source  was 
much  better  than  previously  obtained;  and  the  performance  measure  with  two 
ancillary  information  sources  wa.s  still  better.  The.se  results  demonstrate  that 
the  supervised  relaxation  algorithm  is  a  useful  technique  to  integrate 
information  from  the  various  sources  and  achieve  a  ground  cover  classification 


which  is  both  accurate  and  consistent  in  the  face  of  inconsistencies  which  may 
exist  among  the  data  components. 

In  Chnpter  3,  S-Nets  were  used  to  describe  the  maximum  likelihood 
algorithm  implemented  in  SIMD  and  pipeline  modes  of  parallelism.  Analysis  of 
the  S-Net  models  showed  that  the  overhead  incurred  by  the  pipeline  causes 
longer  execution  time  than  SIMD  mode.  PE  address  and  general  mask 
operations,  which  for  SIMD  are  decoded  in  the  control  unit  and  cost  nothing  in 
the  algorithm  run  time,  can  also  be  modeled  with  S-Nets,  as  can  the  data 
transfer  time  and  data  transfers  through  the  network. 

Some  quantitative  measures  such  as  average  parallelism  and  execution 
time  were  also  developed  and  used  to  make  direct  comparisons  between  two 
architectures.  Detailed  descriptions  and  analyses  of  implicit  operations,  explicit 
operations,  loop  counting  operations,  parallel  transfers,  and  interrupt 
operations  occurring  in  SIMD  and  MIMD  modes  of  parallelism  were  presented 
in  Section  3.2.  Based  on  those  analyses,  the  comparison  of  execution  times 
derived  can  lead  to  the  right  decision  concerning  which  mode  of  parallelism, 
SIMD  or  MIMD,  is  best  suited  to  one  specific  algorithm.  In  general,  algorithms 
which  have  window-type  operations  and  do  not  have  operators  whose  execution 
time  are  argument-dependent  are  suitable  for  SIMD  mode  because  they  can 
make  use  of  overlap  of  .scalar  and  array  operations,  parallel  data  transfers, 
synchronized  operation,  and  simpler  control  scheme.  Algorithms  which  do  not 
have  window-type  operations  and  have  operators  whose  execution  times  are 
argument-dependent  are  best  suited  to  MIMD  mode.  Even  for  algorithms 
which  have  window  type  operations  and  have  operators  whose  exeeution  times 
are  argument-dependent,  MIMD  mode  is  .still  suitable  as  long  as  data  transfers 
can  be  executed  in  parallel  fashion  before  performing  data  computation. 
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Section  3.3  described  the  determination  of  the  optimal  number  of  processing 
elements  for  a  given  image  size  based  on  measures  of  evaluating  the 
performance  of  algorithms  for  SIMD  machines.  An  important  concept  is  that 
the  number  of  PEs  can  be  chosen  subject  to  the  relative  importance  of  system 
cost  and  execution  time. 

Finally,  two  multistage  interconnection  networks  wore  described,  the  cube 
network  and  ADM  network,  which  provide  the  communication  medium  for  the 
multiprocessor  system  and  can  be  configured  to  operate  as  subnetworks 
supporting  complex  tasks. 

The  neighborhood  contributions  from  the  four  nearest  neighbors  have  been 
used  in  the  current  study.  For  further  research,  parallel  implementations  may 
be  developed  to  utilize  neighborhood  relations  beyond  just  near  neighboring 
pixels  to  combine  the  contextual  information  in  the  algorithm.  Equal 
weighting  constants,  djj,  have  been  assigned  to  the  four  nearest  neighbors;  i.e., 
these  four  neighboring  pixels  have  equal  degree  of  influence  in  the 
neighborhood  contribution.  If  the  weighting  constants  can  be  dynamically 
adjusted  to  allow  different  neighbors  to  have  different  degrees  of  influence  on 
the  current  pixel  classification,  the  classification  accuracy  expected  may  be 
better.  Furthermore,  in  a  more  complicated  cas»  in  which  one  ancillary  data 
source  is  felt  to  be  more  accurate  than  another,  it  is  po.ssibIe  to  assign  two 
different  weighting  constants  to  the  ancillary  variables  in  the  supervised 
relaxation  process. 

The  current  modeling  based  on  SfMD  and  pipeline  architectures  appears  to 
work  quite  well.  A  future  challenge  is  to  use  S-Nets  to  model  control  strategies 
or  operating  systems  for  various  type  architectures.  Also,  further  investigations 
should  exploit  algorithms  on  MIMI)  architectures  and  on  MSIMD  architectures 
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in  which  SIMDs  operates  asynchronously.  If  the  total  elapsed  time  of  the  S- 
Net  the  transition  sequences  and  the  average  parallelism  of  the  S-Net  can  be 
quantitatively  determined,  together  with  the  description  presented  in  Section 
3.2,  the  results  may  provide  a  more  reliable  way  to  decide  which  architecture 
model  is  best  suited  to  a  particular  algorithm. 


LIST  OF  REFERENCES 


|1]  R.M.  Hoffer  and  staff,  “Computer-aided  analysis  of  SK\T.AB 
multispectral  scanner  data  in  mountainous  terrain  for  land  use,  forestry, 
water  resource,  and  geologic  applications,”  Tech.  Rept.  No.  121275, 
LARS,  Purdue  University,  1975. 

(2]  J.  Flichards,  D.  Landgrcbe,  and  I*.  Swain,  “A  means  for  utilizing 
ancillary  information  in  multispectral  classification,”  Remote  Sensing  of 
Environment,  Vol.  12,  pp.  463-477.  1982. 

[3]  A  Rosenfeld,  R.  Hummel  and  S.  Zucker,  “Scene  labeling  by  relaxation 
operations,”  IEEE  Trans.  SysL,  Man,  Cybern.,  Vol.  SMC-6,  pp.  420-433, 
June  1976. 

I'll  B-.  Schachter,  A.  Lev,  S.  Zucker,  and  A.  Rosenfeld,  “An  application  of 
relaxation  methods  to  edge  reinforcement,”  IEEE  Trans.  Syst.,  Man, 
Cybern.,  Vol.  SMC-7,  No.  II,  pp.  813-816,  Nov.  1977. 

[5]  A.  Lev,  3.  Zucker,  and  A.  Rosenfeld,  “Iterative  enhancement  of  noisy 
images,”  IEEE  Trans.  Syst.,  Man,  and  Cybern.,  Vol.  SMC-7,  No.  6,  pp. 
435-442,  June  1977. 

[6]  A.  Rosenfeld,  “Iterative  methods  in  image  analysis,”  Proc.  1978  IEEE 
Conf.  Pattern  Recognition  and  Image  Processing,  pp.  181-187,  Jan.  1978. 

(7|  S.  Zucker  and  J.  Mohammed,  “Analysis  of  probabilities  relaxation 
labeling  processes,”  Proc.  1978  IEEE  Conf.  Pattern  Recognition  and 
Image  Processing,  pp.  307-312,  May  1978. 

(8)  S.  Zucker,  E.  Krishnamurthy,  and  R.  Haas,  “Relaxation  processes  for 
scene  labeling;  convergence,  speed,  and  stability,”  IEEE  Trans.  Syst., 
Man,  Cybern.,  Vol.  SMC-8,  No.  1,  pp.  41-48,  Jan.  1978. 

[9)  R.  Hummel  an  S.  Zucker,  “On  the  foundations  of  relaxation  labeling 
processes,”  Proc.  1980  IEEE  Conf.  Pattern  Recognition,  pp.  50-53, 
March  1980. 


[10]  S.  Pi'leg  and  A.  Rosenfeld,  “Determining  compatibility  coefficients  for 
curve  enhancement  relaxation  processes,”  IEEE  Trans.  Syst.,  Man, 
Cybern.,  Vol.  SMC-8,  No.  7,  pp.  548-555,  July  1978. 

[11]  IL  Y.amamoto,  “A  method  of  deriving  compatibility  coefficients  for 
relaxation  operators,”  Computer  Graphics  and  Image  Processing,  Vol.  10, 
pp.  256-271,  1979. 

[12]  J.  Richards,  D.  Landgrebe,  and  P.  Swain,  “Pixel  labeling  by  supervised 
probabilistic  relaxation,”  IEEE  Trans.  Pattern  Analy.  and  Machine 
Intel.,  Vol.  Pv\Ml-3,  No.  2,  pp.  188-191,  March  1981. 

[13]  R.A.  Hummel  and  S.W.  Ziicker,  “On  the  foundations  of  relaxation 
labeling  processes,”  IEEE  Trans.  PAMI,  Vol.  PAMI-5,  No.  3,  pp.  267- 
287,  May  1983. 

[14]  A.J.  Krygiel,  “Synchronous  nets  for  single  instruction  stream  -  multiple 
data  stream  computers,”  D.Sc.  dissertation.  Sever  Institute  of 
Technology,  Washington  University,  St.  Louis,  MO,  May  1980. 

[15]  A.  J.  Krygiel,  “Synchronous  nets  for  single  instruction  stream  -  multiple 
data  stream  computers,”  Proc.  1981  Int’l.  Conf.  Parallel  Processing, 
Aug.  1981,  pp.  266-273. 

[16]  R.J.  McMillen,  H.J.  Siegel,  “Routing  schemes  for  the  augmented  data 
manipulator  network  in  an  MIMD  system,”  IEEE  Trans.  Compt.,  Vol. 
C-31,  pp.  1202-1214,  Dec.  1982. 

[17]  H.J.  Siegel  and  R.J.  McMillen,  “Using  the  augmented  data  manipulator 
network  in  PASM,”  Computer,  Vol.  14,  No.  2,  pp.  25-33,  Feb.  1981. 

[18]  H.J.  Siegel  and  R.J.  McMillen,  “The  multistage  cube:  A  versatile 
interconnection  network,”  Computer,  Vol.  14,  No  i2,  pp.  65-76,  Dec. 
1981. 

[19]  H.J.  McMillen  and  H.J.  Siegel,  “The  hybrid  cube  network,”  IEEE  Proc. 
of  the  Distri.  Data  Acquis.,  Compu.,  and  Control  Sym.,  pp.  11-22,  Dec. 
1980. 

[20]  J.  Richards,  D.  Landgrebe,  and  I’.  Swain,  “On  the  accuracy  of  pixel 
relaxation  labeling,”  IEEE  Trans.  Syst.,  Man,  and  Cybern.,  Vol.  SMC- 
II.  No.  4,  pp.  303-309,  April  1981. 

[21]  P.H.  Swain.  “Fundamentals  of  pattern  recognition  in  remote  sensing,”  in 
Remote  Sensing:  The  Quantitative  Approach,  P.  H,  Swain  and  S.  M. 
Davis,  I'ds.,  McOraw-Hill,  New  York,  1978. 


[22]  L.J.  Siegel,  H.J.  Siegel,  P.H.  Swain,  G.B.  Adams  III.  A.E.  Feather,  G.M. 
Lin,  and  M.R.  Warpenburg,  “Parallel  Image  Processing/Feature 
Extraction  Algorithms  and  Architecture  Emulation:  Interim  Report  for 
Fiscal  1081,”  TR-EE  81-35,  Purdue  Univ.,  School  of  Electrical 
Engineering,  Oct.  1981. 

[23]  J.T.  Kuehn,  “A  more  accurate  complexity  analysis  technique,”  informal 
communications.  Sept.  1982. 

[24]  L.J.  Siegel,  H.J.  Siegel,  and  A.E.  Feather,  “Parallel  processing 
approaches  to  image  correlation,”  /EEE  Trans.  Comput.,  Vol.  C-31,  pp. 
208-218,  Mar.  1982. 

[25]  P.T.  Mueller,  Jr.,  L.J.  Siegel,  and  II. J.  Siegel,  “Parallel  algorithms  for 
the  two-dimensional  FFT,”  Proe.  5th  Int’l.  Con/.  Pattern  Recognition, 
pp.  497-502,  Dec.  1980. 

[26]  D.L.  Tuomenksa,  G.B.  Adams  III,  II. J.  Siegel,  and  O.R.  Mitchell,  “A 
parallel  algorithm  for  contour  extraction:  advantages  and  architectural 
implications,”  lEER  Comput.  Soe.  Conf.  Computer  Vision  and  Pattern 
Recognition,  pp.  336-344,  June  1983. 

[27]  H.J.  Siegel,  L.J.  Siegel,  F.C.  Kemmcrer,  P.T.  Mueller,  Jr.,  H.E.  Smalley, 
Jr.,  and  S.D.  Smith,  “PASM:  A  partitionable  SIMD/MIMD  system  for 
image  processing  and  pattern  recognition,”  IEEE  Trans.  Comput.,  Vol. 
C-30,  pp.  934-947,  Dec.  1981. 


[28]  R.O.  Duda  and  P.E.  Hart,  Pattern  Classification  and  Scene  Analysis, 
Wiley,  New  York,  1973. 

[29]  L.J.  Siegel.  H.J.  Siegel,  an  P.H.  Swain,  “Performance  measures  for 
evaluating  algorithms  for  SIMD  machines,”  IEEE  Trans.  Software 
Engineering,  Vol.  SE-8,  pp.  3I9-33I,  July  1982. 

[30]  K.E.  Batcher,  “MPP  -  a  massively  parallel  processor,”  Proc.  1979  Int’l 
Conf.  Parallel  Processing,  Aug.  1979,  p.  249. 


154 


APPENDIX  - 

PARTITIONING  AND  ROUTING  SCHEMES  OF  THE  CUBE  NETWORK  (IS) 


The  generalized  cube  network  has  N  inputs,  N  outputs,  and  m  =  logoN 
stages.  Each  interchange  box  can  be  set  to  one  of  the  four  legitimate 
configurations  shown  in  Fig.  A.l.  The  m  cube  interchange  functions  are 
defined  as 

cubei(P„_,  •  •  P,Po)  =  P„_,  •  •  •  Pi  +  ,PiPi-,  •  •  P,Po 

where 

0  <  i  <  m 

P;  means  the  complement  of  bit  Pj.  Stage  i  of  the  generalized  cube  network 
contains  the  cube;  interconnection  function,  i.e.,  i/o  lines  of  each  box  differ  in 
the  ith  bit  position  as  shown  in  Fig.  A.l. 

The  cube  network  can  be  partitioned  into  independent  groups.  The  PEs 
in  a  group  must  agree  in  m-s  bits  if  this  group  has  2®  PEs  and  m  =  log-^N.  For 
example,  if  the  cube  network  has  N=8  inputs  and  outputs,  it  may  be 
partitioned  into  two  groups:  group  A  consists  of  ports  0  to  3  and  group  B 
consists  of  ports  4  to  7.  By  setting  all  of  the  interchange  boxes  in  stage  2  to 
straight,  these  two  groups  are  isolated  as  shown  in  Fig.  A. 2.  If  the  interchange 
boxes  in  the  other  stage  are  set  to  straight,  then  two  independent  groups  other 
than  the  one  in  F'ig.  A. 2  are  formed  as  shown  in  Fig.  A.3  and  in  Fig.  A. 4. 
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I>ach  subnetwork  has  the  properties  of  a  cube  network  and  can  be  further 


subdivided  to  form  smaller  independent  subnetworks.  As  shown  in  Fig.  A. 5, 
the  subnetwork  H  in  Fig.  A. 2  is  further  divided  to  form  two  independent 
subnetworks  C'  and  I). 

Any  type  of  a  centralized  control  unit  would  create  a  bottleneck,  but  if 
control  is  distributed  among  the  processors,  each  is  responsible  for  determining 
its  own  control  signals.  Therefore,  the  purpose  of  using  routing  tags  as  headers 
on  messages  is  to  allow  network  control  to  be  distributed  among  the  processors. 
If  the  processors  tising  the  network  arc  configured  as  an  SIMD  machine,  their 
nonbroadcasting  communication  needs  take  the  form  of  interconnection 
functions.  An  interconnection  function  is  a  specification  of  the  way  all  of  the  N 
input  ports  are  connected  to  the  N  output  ports.  An  interconnection  function 
is  said  to  be  admissible  by  the  network  if  there  arc  no  conflicting  requests  for 
interchange  box  settings.  In  establishing  an  admissible  interconnection 
function,  routing  tags  are  used  by  all  active  processors  simultaneously.  In 
MIMI)  mode,  the  routing  requests  occur  at  random  intervals. 

l  or  one-to-one  connection,  the  routing  tag,  T,  is  calculated  as  T  =  S  0D 
in  which  S  is  the  input  port  and  D  is  the  output  port.  The  operator  0  denotes 
“exclusive  or.”  For  example,  if  S=5  =  101  and  D=3— Oil,  then 
T  =  S  0  D  =  101  0  011  =  110.  The  interchange  box  in  stage  i  only  checks  Tj 
which  is  the  ith  bit  of  routing  tag  T.  If  Tj  =  1,  set  interchange  box  to 
exchange.  If  Tj  =0,  set  interchange  box  to  straight.  Fig.  A. 6  shows  the  path 
established  between  input  port  5  and  output  port  3.  The  incoming  tag  is  also 
the  same  as  the  return  tag.  Therefore,  it  can  implement  handshaking. 

For  one-to-many  broadcast,  the  routing  tag  contains  two  parts;  R  contains 
routing  information  and  B  contains  broadca.st  information.  11  contains  m  bits. 
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so  does  B.  For  broadcasting,  the  destination  group  must  bo  a  group  of  size 
equal  to  2b  In  tliis  group,  there  must  be  at  most  j  bits  that  disagree  among 
any  pair  of  the  addresses,  and  m-j  bit  positions  in  which  all  these  addresses 
must  agree.  For  example,  input  port  8=5  =  101  broadcasts  messages  to 
output  ports  2,  3,  0,  and  7,  in  which  all  addres.ses  agree  in  the  first  bit  position 
(the  least  significant  bit  is  the  0th  bit).  Then, 
R  =  S  0  Dj  =  S  0  Dj  =  101  ©  010  =  1 1  i  where  D;  is  any  one  of  I  destination 
addresses  and  B  =  Dj  0  =  010  0  1 1 1  =  101  where  Dj  and  Di^  must  have 
hamming  distance  of  2.  The  interchange  boxes  in  stage  i  check  B;  first.  If 
B;  =  1,  set  interchange  box  to  either  upper  broadcast  or  lower  broadcast.  If 
Bi=0,  then  check  R;.  If  Ri=0,  set  interchange  box  to  straight,  otherwise  set 
exchange.  Fig.  A. 7  shows  input  port,  S=5,  broadcasts  messages  to  output 
ports,  D  =  2, 3, 6  and  7. 


